Raw performance comparison without fancy visualizations
How it works: Both methods generate the same text, but speculative decoding should be faster by using a small model to draft tokens ahead.
Output: Plain text output as it would appear in a normal scenario, with timing and efficiency statistics.
Small model drafts, large model verifies
Large model generates one token at a time
Run a comparison to see the results!