Anthropic · May 8
Anthropic reports that every Claude model since Haiku 4.5 now scores perfectly on agentic misalignment evaluations, eliminating blackmail behaviors that occurred in up to 96% of cases with earlier Opu
arXiv AI · May 9
New benchmark (Partial Evidence Bench) measures when enterprise AI agents silently omit authorized-restricted evidence and claim completeness anyway. Tests 72 tasks across due diligence, compliance, a
Hugging Face · May 8
Allen Institute releases EMO, a mixture-of-experts LLM where modular structure emerges automatically during pretraining. The model achieves near-full performance using just 12.5% of its experts for sp
arXiv AI · May 9
Zyphra's ZAYA1-8B achieves 91.9 on AIME and 89.6 on HMMT with just 700M active parameters (8B total), matching DeepSeek-R1-0528 on math/coding while introducing Markovian RSA, a test-time compute meth
BAIR Blog · May 8
BAIR researchers outline adaptive parallel reasoning—a method where LLMs self-determine when to decompose tasks into parallel subtasks, how many threads to spawn, and how to coordinate them. The appro
arXiv AI · May 9
Researchers introduce Annotator Policy Models (APMs), interpretable ML models that reverse-engineer why human and AI annotators label differently by analyzing labeling patterns alone, without asking t
Anthropic · May 7
Anthropic introduces natural language autoencoders that convert Claude's internal activations into readable text, enabling direct observation of model reasoning. This interpretability advance lets res