Operating-Layer Controls for Onchain Language-Model Agents Under Real Capital
A 21-day live deployment of 3,505 language-model trading agents on real Ethereum revealed that reliability emerges not from the base model but from the operating layer around it—prompt compilation, typed controls, policy validation, and execution guards. The system processed 7.5M agent invocations and $20M in trading volume with 99.9% settlement success, but only after targeted fixes (e.g., reducing fabricated sell rules from 57% to 3%) that text-only benchmarks never caught.
arXiv AI · 3 min (abstract)
Research