Meta presents Layer Skip.
Enabling early exit inference and self-speculative decoding.
We present LayerSkip, an end-to-end solution to speed-up inference of large language models (LLMs).
Join the discussion on this paper page.
Meta presents Layer Skip.
Enabling early exit inference and self-speculative decoding.
We present LayerSkip, an end-to-end solution to speed-up inference of large language models (LLMs).
Join the discussion on this paper page.
Comments are closed.