Apr 262024 Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding Meta presents Layer Skip. Enabling early exit inference and self-speculative decoding. We present LayerSkip, an end-to-end solution to speed-up inference of large language models (LLMs). Join the discussion on this paper page.