Token Arena: A Continuous Benchmark Unifying Energy and Cognition in AI Inference
TokenArena benchmarks 78 AI inference endpoints across 12 model families using five core axes (speed, latency, price, context, quality) plus modeled energy, revealing the same model differs wildly across providers—up to 12.5 points in accuracy, 6.2x in joules per correct answer, and dramatically reordering leaderboards depending on workload (chat vs. retrieval vs. reasoning).
arXiv AI · 4 min (abstract)
Research