Speed Comparison Demo

Compare speculative decoding vs sequential generation performance

How it works: Enter a prompt and watch both methods generate text simultaneously. Speculative decoding uses a small model to draft tokens ahead, while sequential generation uses only the large model one token at a time.

Expected Result: Speculative decoding should be significantly faster while maintaining similar quality.

Real vs Mock: Start with mock mode for instant demonstration, then load real AI models for authentic performance comparison.

Small Model (DistilGPT-2) ~82M parameters
Mock Mode
Large Model (GPT-2) ~124M parameters
Mock Mode
🚀 Speculative Decoding

Uses small model to draft tokens, large model to verify

Waiting to start...
0.0s
Total Time
0
Model Calls
0%
Efficiency
0
Tokens/sec
🐌 Sequential Generation

Uses large model to generate one token at a time

Waiting to start...
0.0s
Total Time
0
Model Calls
100%
Efficiency
0
Tokens/sec

Performance Summary

Run a comparison to see the performance difference!