Apple presents Recurrent Drafter for Fast Speculative Decoding in Large Language Models.
In this paper, we introduce an improved approach of speculative decoding aimed at enhancing the efficiency of serving large language models.
Join the discussion on this paper page.
