Anthropic Revises Hiring Test as Claude AI Outperforms Candidates

Key Points

Anthropic has used a take‑home test for candidates since 2024.
AI coding assistant Claude has progressively outperformed human applicants.
Claude Opus 4 outperformed most humans; Opus 4.5 matched top candidates.
Candidates are allowed to use AI tools during the assessment.
The test’s ability to differentiate talent eroded as AI improved.
Tristan Hume designed a new, hardware‑agnostic test to challenge AI.
Anthropic publicly shared the original test, seeking better challenges.
The issue reflects wider concerns about AI in education and hiring.

Background

Since 2024, Anthropic’s performance optimization group has required job applicants to complete a take‑home test designed to evaluate their technical expertise. The test originally focused on hardware optimization problems, reflecting the team’s core work.

AI Advancements Prompt Redesigns

Over the past few years, AI coding assistants, particularly Anthropic’s own Claude models, have advanced rapidly. According to team lead Tristan Hume, each new version of Claude has forced the company to redesign the assessment. When Claude Opus 4 was introduced, it outperformed most human applicants, though the test still allowed the strongest candidates to be distinguished from the model. The subsequent release of Claude Opus 4.5 matched even the top human performers, eliminating the test’s ability to differentiate between the best candidates and the AI’s output.

Policy on AI Tool Use

Anthropic explicitly permits candidates to use AI tools during the take‑home test. A correction to earlier reporting clarified that AI use is allowed, not prohibited. Despite this permissive policy, the company faces a dilemma: if human participants cannot improve upon the AI’s answers, the test no longer serves as a reliable gauge of human skill.

New Test Design

In response to these challenges, Hume developed a new version of the assessment that shifts away from hardware‑optimization tasks. The redesigned test emphasizes novel problem‑solving elements intended to be difficult for current AI models, thereby restoring its utility for evaluating human talent. Hume also shared the original test publicly, inviting external experts to devise challenges that could outpace Claude Opus 4.5, stating, “If you can best Opus 4.5, we’d love to hear from you.”

Implications and Outlook

The situation at Anthropic mirrors broader concerns about AI’s impact on educational and professional assessments worldwide. As AI tools become more capable, organizations must continually adapt their evaluation methods to ensure they remain meaningful. Anthropic’s proactive approach—regularly updating its test and seeking community input—demonstrates a commitment to preserving the integrity of its hiring process while acknowledging the evolving capabilities of AI.

Source: techcrunch.com