Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding

Retrieved on: 2024-02-06 16:53:42

Tags for this article:

Click the tags to see associated articles and topics

Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding. View article details on hiswai:

Excerpt

We present a novel inference scheme, self-speculative decoding, for accelerating Large Language Models (LLMs) without the need for an auxiliary ...

Article found on: ui.adsabs.harvard.edu