For a few weeks I was routing tasks by feel. Claude planned. Codex debugged. Sometimes I gave the same problem to both just to see what came back. It worked, roughly. But there was friction I couldn’t name.
Turned out the friction was a wrong question. I kept asking which one was better. That framing assumes they’re after the same job. They’re not.
Codex sits close to the machine. Close to files, to the terminal, to the actual moving parts of a project. I want it to read the environment, make the change, explain what changed, and stay inside the rails. Low conversation, high execution. When I use Claude, I’m usually asking something different — whether something reads right, whether the structure holds, whether a sentence sounds like a person meant it.
One handles what the system needs. The other handles what I need the reader to feel.
Once I saw that, I stopped bouncing the same task between them. The handoffs got deliberate. The work got cleaner.
What made it stick wasn’t the insight. It was writing it down.
I have a folder. In that folder there’s a file. It says, in plain language: n8n handles orchestration, Codex handles implementation, Claude handles planning and drafting. Three layers. Each one owns something, and the boundaries don’t move session to session.
That file exists because I built it with both agents over a couple of sessions. Now it’s in a repo. When I open a new session, the roles are already decided. I don’t relitigate them every time. That’s a different thing from having a preference — a preference lives in your head and drifts. A documented architecture just sits there and holds.
I used to think the goal was one assistant that could do everything. I’m less interested in that now. One tool trying to cover everything tends to do most things adequately and nothing particularly well. The specialist model costs more coordination and pays back in output quality.
Codex builds what I describe. Claude shapes how I describe it. When the work matters, one hands off to the other — and the result is better than either would land on alone, with less time spent second-guessing the process.
The debate about which AI is better is real. People are running benchmarks, arguing about models, comparing outputs on identical prompts. That’s not the wrong conversation. It’s just not mine anymore. My conversation is about roles. And once the roles are written down, the debate stops feeling urgent.
