lightseekorg/tokenspeed: TokenSpeed is a speed-of-light LLM inference engine.
TokenSpeed, a new LLM inference engine optimized for agentic workloads, achieves 580 tokens/sec on Qwen3.5-397B using Blackwell GPUs. Built with static compilation for multi-GPU parallelism, type-safe KV cache management, and one of the fastest MLA implementations, it matches TensorRT-LLM performance with vLLM's ease of use.
GitHub Trending · GitHub repo
Repos