Anthropic Says Claude Agents Beat Human Baseline on Weak-to-Strong Alignment Task
Anthropic says a team of Claude Opus 4.6 agents outperformed a human baseline on a weak-to-strong supervision benchmark, while still falling short of a clear production-scale breakthrough.