Key Points
- Anthropic used 16 Claude Opus 4.6 instances organized as “agent teams”.
- The agents operated in separate Docker containers with no central orchestrator.
- Over two weeks and nearly 2,000 sessions, they produced a 100,000‑line Rust compiler.
- The compiler can build a bootable Linux 6.9 kernel for x86, ARM and RISC‑V.
- It successfully compiles major open‑source projects such as PostgreSQL and SQLite.
- Testing showed a 99% pass rate on the GCC torture suite and it ran Doom.
- The project was released publicly on GitHub.
- Carlini highlighted that a C compiler’s well‑defined spec and existing tests make it ideal for AI‑assisted development.
Background and Goal
Anthropic researcher Nicholas Carlini, a member of the company’s Safeguards team with a background at Google Brain and DeepMind, described a project in which he tasked sixteen instances of Claude Opus 4.6 with building a C compiler from the ground up. The effort was framed as a demonstration of Anthropic’s new “agent teams” capability, which allows multiple AI model instances to collaborate on a shared codebase.
Agent Team Architecture
Each Claude instance ran inside its own Docker container. The containers cloned a common Git repository and claimed work by creating lock files. When a task was completed, the agent pushed its changes back to the repository. No central orchestration agent directed the workflow; instead, each instance independently identified the most obvious problem to address next and proceeded to solve it. When merge conflicts appeared, the AI agents resolved them without human intervention.
Development Process and Resources
The collaboration spanned roughly two weeks and involved nearly 2,000 Claude Code sessions. The total cost in API fees was about $20,000. During this period, the agents collectively generated a 100,000‑line compiler written in Rust. The resulting tool was capable of compiling a bootable Linux 6.9 kernel for three major architectures: x86, ARM and RISC‑V.
Capabilities and Performance
Anthropic released the compiler on GitHub, where it demonstrated the ability to compile a range of prominent open‑source projects, including PostgreSQL, SQLite, Redis, FFmpeg and QEMU. In testing, the compiler achieved a 99 percent pass rate on the GCC torture test suite, a rigorous benchmark for compiler correctness. As a final validation, the compiler successfully compiled and executed the classic game Doom, which Carlini described as “the developer’s ultimate litmus test.”
Implications
The experiment underscores why a C compiler is a near‑ideal target for semi‑autonomous AI coding. The language specification is decades old and well‑defined, comprehensive test suites already exist, and a reference compiler provides a clear correctness baseline. Carlini noted that many real‑world software projects lack these advantages, making the task of defining appropriate tests a larger challenge than writing code that passes existing tests.
Source: arstechnica.com