Maybe AI agents can be lawyers after all

Key Points

Mercor benchmark measured AI performance on law and corporate analysis tasks.
All major models previously scored below 25 percent.
Anthropic’s Opus 4.6 reached just under 30 percent on one‑shot trials.
With multiple attempts, Opus 4.6 averaged 45 percent.
New “agent swarms” feature may have boosted multistep problem solving.
Mercor CEO Brendan Foody called the score jump “insane.”
Scores remain far from full legal competence, but the gap is shrinking.
Legal professionals should adopt a more cautious outlook on AI displacement.

Maybe AI agents can be lawyers after all

Benchmark highlights rapid AI progress

Last month, a benchmark created by Mercor measured how well AI agents handle professional tasks such as law and corporate analysis. At that time, every major laboratory’s model scored below 25 percent, leading analysts to conclude that lawyers were safe from immediate AI replacement.

Since then, Anthropic introduced Opus 4.6, a new foundation model that dramatically altered the leaderboard. In one‑shot trials—where the model receives a single attempt to solve a problem—Opus 4.6 achieved a score just shy of 30 percent. When the model was allowed a few more tries, its average performance rose to 45 percent. This represents a substantial increase from the previous best scores, which hovered in the high teens.

Agentic features may be the key

The Opus 4.6 release also added a suite of agentic capabilities, including the novel “agent swarms” feature. These capabilities enable the model to break down complex, multistep problems and coordinate multiple sub‑agents to work toward a solution. Observers believe that such features contributed to the improved benchmark results, especially on tasks that require layered reasoning, such as legal analysis.

Industry reaction

Mercor’s chief executive, Brendan Foody, expressed strong enthusiasm for the jump in performance. He described the movement from an 18.4 percent score to a 29.8 percent score within a few months as “insane,” underscoring the rapid pace of advancement in AI research.

Implications for the legal profession

Despite the progress, the benchmark scores remain well below the 100 percent threshold needed for full, reliable legal decision‑making. Consequently, the immediate threat of AI replacing lawyers is still limited. However, the sizable improvement suggests that the legal field should adopt a more cautious outlook than it did a month ago. The gap between current capabilities and full competence is narrowing, and continued enhancements in agentic features could accelerate that trend.

Looking ahead

The APEX‑Agents Leaderboard, which tracks these benchmark results, now reflects a more competitive environment among AI developers. As foundation models continue to evolve and incorporate advanced agentic tools, future benchmark rounds are likely to see even higher scores. Stakeholders in the legal industry, AI research community, and technology investors will be watching closely to gauge when AI might become a viable partner—or even a competitor—to human lawyers.

Source: techcrunch.com