Toggle light / dark theme

Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding

Meta presents Layer Skip.

Enabling early exit inference and self-speculative decoding.

We present LayerSkip, an end-to-end solution to speed-up inference of large language models (LLMs).


Join the discussion on this paper page.