Key Points
- AI agents are most effective when viewed as tools that amplify human skills.
- OpenAI’s Codex app lets developers run multiple parallel agent threads on isolated code copies.
- The underlying GPT‑5.3‑Codex model achieved a benchmark score of 77.3%, surpassing Anthropic’s Opus 4.6.
- User roles are shifting from simple prompt writers to supervisors managing AI tasks.
- The model of AI as a managed tool sparks debate about its practicality and impact on productivity.
From Chat Bots to Amplifiers
While hype often portrays AI agents as autonomous coworkers, real‑world experience suggests they function best as tools that boost existing human skills. These agents can generate impressive drafts quickly, but they still need ongoing human correction and guidance.
OpenAI’s Codex Desktop App
OpenAI introduced a macOS desktop application for Codex, described by the company as a “command center for agents.” The app enables developers to launch multiple agent threads in parallel, each operating on an isolated copy of a codebase through Git worktrees. This setup allows developers to act as supervisors, assigning tasks, monitoring progress, and intervening when an agent requires direction.
Advances in Model Performance
Alongside the Codex app, OpenAI released GPT‑5.3‑Codex, the model powering the new tool. According to OpenAI, early versions of GPT‑5.3‑Codex were used to debug the model’s own training run, manage its deployment, and diagnose test results. On the Terminal‑Bench 2.0 benchmark, GPT‑5.3‑Codex achieved a score of 77.3%, exceeding the recently launched Opus 4.6 from Anthropic by roughly 12 percentage points.
Redefining the User’s Role
The common thread across these products is a shift in the user’s role. Rather than simply typing a prompt and awaiting a single response, developers and knowledge workers become more like middle managers of AI. They delegate tasks, review outputs, and hope the agents beneath them do not silently cause problems.
Ongoing Debate
Whether this supervisory model will become the norm—or whether it is a beneficial approach at all—remains widely debated. Critics question the practicality of constantly supervising AI agents, while proponents argue that the model unlocks new levels of productivity by allowing humans to focus on higher‑order decision making.
Source: arstechnica.com