← Back to Learn Learn · 05

Attention & latency

Why does longer context make AI slower? Self-attention scales as O(n²) — double your tokens and you quadruple the compute. Use the slider to feel the math.

O(n²) latency calculator — instant safe zone

Tokens in context 20,000

Model limit 10%

0.04× attention cost vs baseline

10% context window used

Fast response speed zone

ContextCrunch compression benefit

Compression ratio 30%

Ask about attention — live AI explanation Powered by Groq

See the latency impact of your current conversation.

Try the tool →