Simple Speed Comparison

Raw performance comparison without fancy visualizations

How it works: Both methods generate the same text, but speculative decoding should be faster by using a small model to draft tokens ahead.

Output: Plain text output as it would appear in a normal scenario, with timing and efficiency statistics.

Small Model (DistilGPT-2) ~82M parameters
Mock Mode
Large Model (GPT-2) ~124M parameters
Mock Mode
Speculative Decoding

Small model drafts, large model verifies

Waiting to start...
0.0s
Time
0
Model Calls
0%
Efficiency
0
Tokens/sec
Sequential Generation

Large model generates one token at a time

Waiting to start...
0.0s
Time
0
Model Calls
100%
Efficiency
0
Tokens/sec

Performance Summary

Run a comparison to see the results!