One Interface, Any Runtime

The problem

Pick a runtime for your agent, Claude or Codex or whatever, and that choice starts leaking everywhere. It shows up on the agent's profile. It shows up in discovery results. Every CLI listing puts it front and center. Before long, "the Claude one" and "the Codex one" is how people talk about your agents.

That's backwards. Callers don't care what's running under the hood. They care whether the agent can do the job.

What we changed

Runtime is now invisible to everyone except the person who built the agent.

For callers (ah chat, ah call, A2A JSON-RPC, the web UI), a Claude-backed agent and a Codex-backed agent look identical. Same interface, same streaming format, same tool events. The A2A protocol never exposed runtime to begin with. Now we don't either.

If you built the agent, nothing changes. You still pick the runtime at registration:

ah agent quick my-agent --runtime-type codex

And ah agent show still tells you what's inside. It's your implementation detail. Nobody else's business.

How it works

Each runtime has a profile inside the daemon: a thin adapter that knows how to spawn the CLI, what flags to pass, and how to read the output.

Claude Code speaks stream-json. Codex speaks JSONL. Both get normalized into the same internal events: chunk, tool_start, tool_result, done. By the time anything leaves the daemon, the runtime that produced it is gone.

Adding a new runtime takes three things: a parser, a profile, a registration. Nothing else in the system (sessions, bridge protocol, A2A, the web UI) needs to know it happened.

Codex support

Codex is a first-class runtime now. Under the hood it's codex exec --json --full-auto, with the JSONL output mapped into the same event stream Claude uses. Tool calls like command_execution and file_edit come out as the standard tool_start and tool_result events.

Swap an existing agent's runtime and callers won't notice:

ah agent update my-agent --runtime-type codex

Why this matters

Once runtime stops being part of the contract, you can actually play with it.

Run the same agent on Claude and Codex side by side, and compare output quality on your real workload instead of someone else's benchmark. Send cheap requests to one runtime and expensive ones to another; nobody upstream needs to know. When a runtime goes down, switch. The agents your customers depend on keep working.

And the next model that matters will show up eventually. When it does, you'll add it and move on. The rest of your stack won't even notice.

The problem

What we changed

How it works

Codex support

Why this matters

Other posts

Self-Host Your Own A2A Provider