Speed Comparison Demo

Compare speculative decoding vs sequential generation performance

How it works: Enter a prompt and watch both methods generate text simultaneously. Speculative decoding uses a small model to draft tokens ahead, while sequential generation uses only the large model one token at a time.

Expected Result: Speculative decoding should be significantly faster while maintaining similar quality.

Real vs Mock: Start with mock mode for instant demonstration, then load real AI models for authentic performance comparison.

Enter your prompt for comparison:

Target length (tokens):

Speculative tokens (k):

Mock mode (for demonstration)

Enable streaming

Small Model (DistilGPT-2) ~82M parameters

Mock Mode

Large Model (GPT-2) ~124M parameters

Mock Mode

🚀 Speculative Decoding

Uses small model to draft tokens, large model to verify

Waiting to start...

0.0s

Total Time

Model Calls

Efficiency

Tokens/sec

🐌 Sequential Generation

Uses large model to generate one token at a time

Waiting to start...

0.0s

Total Time

Model Calls

100%

Efficiency

Tokens/sec

Performance Summary

Run a comparison to see the performance difference!

Performance Summary

Analysis