The Foundation Is the Product
Karpathy's agent warning is not a reason to stop building. It is a reminder that useful agents emerge from stronger foundations: models, workflows, and economics.
Chinese version: 中文版
There is a tempting way to read Andrej Karpathy’s warning about AI agents:
Stop building agents. Wait for the labs. The real work is happening somewhere else.
I think that is the wrong lesson.
The better lesson is sharper and more useful: do not confuse the agent with the foundation that makes an agent useful.
In his Dwarkesh Patel interview, Karpathy was not dismissing today’s agents. He explicitly says he uses tools like Claude and Codex every day. His concern is the industry’s habit of over-predicting how quickly agents become reliable employees. He points back to the early OpenAI era, when reinforcement-learning agents, Atari, games, and the Universe project made it feel natural to chase full agents early. His own web-operating agent work was aiming at knowledge work with keyboard and mouse, but it was too early. The missing layer was representation: the model underneath did not yet have enough world structure to make the agent work.
That is why the most important sentence in the discussion is not “agents are bad.” It is the sequence underneath it: first language models, first representations, first pre-training, then agent behavior on top.
The timing of his move to Anthropic makes the point even louder. In May 2026, Karpathy joined Anthropic’s pre-training work, with reporting that his team would use Claude to accelerate pre-training research itself. The person being quoted in an agent debate is, in practice, returning to the deepest foundation layer of frontier models.
So the question for builders is not:
Should I build agents or wait?
The better question is:
When I say I am building an agent, am I actually building the foundation that lets one work?
That is the question this series has been circling from the beginning.
The agent is the visible part
An agent is easy to point at.
It opens a file. It writes a draft. It calls a tool. It replies with a confident status update. It feels like the product because it is the thing you talk to.
But in a serious workflow, the visible agent is only the top layer. Underneath it sits the real machinery:
| Visible agent behavior | Foundation underneath |
|---|---|
| “I know what this project is” | Boot documents, constitution, handoff, working-tree inspection |
| “I can do the job” | Role boundaries, skills, tools, schemas, expected artifacts |
| “I remember what happened” | Archives, databases, structured reports, vector memory |
| “I finished” | Verification commands, public URL checks, manifests, file existence |
| “I can cooperate” | Contract files, bridge protocols, permission levels, traceable reports |
| “I can improve” | Finalize loops, self-reflection, deterministic demotion into code |
If those layers do not exist, the agent is not a worker. It is an improviser with access to a keyboard.
That can produce impressive demos. It cannot run a company.
Demos love the agent-first story
The agent-first story is seductive because demos reward surface behavior.
You give the model a task. It opens a browser. It clicks things. It writes code. It produces a file. Everyone watching sees motion, and motion looks like intelligence.
The problem is that products are not judged at demo reliability. Products are judged at boring reliability.
Can the system wake up next week and know which rules changed? Can it avoid publishing the same article twice? Can it tell whether a post actually went live? Can it refuse to leak credentials? Can it distinguish a dispatched job from a completed job? Can it recover after a failed publish without inventing a happy ending? Can it leave enough evidence that tomorrow’s agent knows what happened today?
That is where the agent-first story breaks.
Self-driving had the same lesson. A car that handles a clean demo route can feel magical. A car that drives everywhere must survive the long tail: weather, construction, weird signage, edge cases, confusing human behavior, and the merciless mathematics of reliability. The demo proves possibility. The product requires the foundation.
AI agents are walking into the same trap. The interesting failure is not that an agent cannot sometimes complete a task. It can. The failure is that people ask the agent to carry the entire organization inside the current chat window.
That is not autonomy. That is context debt.
What we have actually been building
This is why the previous five essays were not really about prompts.
They were about the foundation.
The first essay argued that a workflow should be treated as a company: constitution, roles, skills, tools, archive, and audit loop. The second described the lifecycle: start from the real documents, then finalize by turning friction into learning. The third explained how separate agent companies cooperate through a visiting CEO instead of using the human owner as a message bus. The fourth crossed the harder border between two different AI apps, where one app can hire another only through file-based contracts. The fifth stepped back into design philosophy: the smartest model should design the company; cheaper models and deterministic functions should run it.
Karpathy’s warning is almost a research-lab version of the same pattern.
Do not tack “agent” on top of a weak substrate and expect competence to appear. Build the substrate first.
In frontier AI, that substrate is the model: representations, pre-training, reasoning, multimodality, memory, continual learning, computer use, and the cognitive core that can support action.
In an applied workflow, the substrate is the operating system around the model: documents, permissions, tool floors, schemas, archives, validation, reports, and the rules that decide when the model should create, when it should judge, and when code should execute.
Different layers, same architecture principle.
Agency is not something you summon. It is something that emerges when the layer below is strong enough to support it.
The foundation-first builder
The foundation-first builder behaves differently from the agent-first builder.
The agent-first builder asks:
How do I make the agent do more?
The foundation-first builder asks:
What does the agent no longer need to remember?
That one question changes the whole design.
If the agent has to remember the publishing sequence, write a publishing command. If it has to remember the cover style, generate a deterministic cover brief from the channel design language. If it has to remember whether a task is done, create a verifier. If it has to remember what happened last session, write a handoff file. If it keeps making the same judgment the same way, demote that judgment into a rule or a function.
Over time, the agent should carry less live context, not more.
That sounds backwards because most people are trying to make agents more powerful by giving them more tools, more memory, more prompts, more autonomy, more permissions. Some of that helps. But the deeper improvement is to move responsibility out of fragile live cognition and into durable infrastructure.
The agent becomes stronger when the company around it becomes stronger.
The three foundations
When I look at my own workflows now, I see three foundation layers.
The first is model foundation.
This is Karpathy’s layer. The raw model needs better representations, better reasoning, better perception, better action grounding, better long-horizon learning, and better ways to use its own cognition during training and research. Builders outside frontier labs cannot directly solve all of that, but we should understand it. If the model cannot reliably understand the world, no amount of prompt choreography turns it into a dependable employee.
The second is workflow foundation.
This is the layer an applied builder can control. It includes the constitution, workflow spine, role registry, skills, CLIs, schemas, archives, public verification, secret-handling rules, status files, and postmortem loops. This is where the product lives. Not in the agent’s personality, but in the organization that shapes the agent’s behavior.
The third is economic foundation.
Agents are not free minds floating in the cloud. They run on subscriptions, quotas, APIs, GPUs, latency, context windows, and human attention. A workflow that ignores unit economics will eventually break even if the logic is beautiful. The expensive model should be reserved for design and judgment. Cheaper models should staff bounded jobs. Deterministic code should run anything that does not require creation or judgment. A good agent company is not just intelligent. It is cost-aware.
These three foundations reinforce each other.
Better models make workflows more capable. Better workflows make current models useful today. Better economics let the system run long enough to learn.
Why “the agent is not the product” is not anti-agent
Saying the agent is not the product can sound like an insult to agents.
It is not.
It is the same as saying the employee is not the company. A brilliant employee matters enormously, but a company cannot be reduced to one employee’s current memory. The company is also the law, the filing system, the machines, the brand, the accounting, the handoff process, the safety rules, and the learned routines that survive when one worker goes home.
A serious agent workflow needs the same humility.
If the agent is the product, every session starts from personality. If the foundation is the product, every session starts from law. If the agent is the product, success is what the model says it did. If the foundation is the product, success is what the verifier proves. If the agent is the product, memory is whatever survived in context. If the foundation is the product, memory is an archive that another agent can search tomorrow.
This is why the slogan matters:
The agent is not the product. The foundation is the product.
Once that is true, agents become much less mystical and much more useful.
They become workers inside an organization.
What this means for people building now
If you are building agents right now, you are not wasting time. You may be at the frontier, but not because you are waiting for the biggest lab to hand you a perfect employee.
You are at the frontier if you are learning how to build the workplace that makes imperfect agents useful.
That means the practical checklist is not glamorous:
- Write the constitution before the clever prompt.
- Define roles before spawning workers.
- Pass file paths and IDs instead of pasting huge payloads into chat.
- Store final artifacts where another agent can find them.
- Verify public outcomes instead of trusting self-reports.
- Turn repeated steps into commands.
- Turn repeated judgments into rules.
- Treat credentials as a safety boundary, not an inconvenience.
- Make every external delegation a contract with a visible trace.
- Close the loop after each session so the workflow learns.
None of this looks as magical as an agent clicking around a browser.
That is exactly why it matters.
The future agent product will not be a single chat window with a heroic personality. It will be a stack: model capability at the bottom, workflow infrastructure around it, economic routing beneath the surface, and agents emerging as the visible workers at the top.
Karpathy’s warning is useful because it cuts through the performance of agency. It asks whether the layer underneath is strong enough.
For frontier labs, that means better models.
For builders, it means better foundations.
And for my own work, it means the past five essays were not a detour from the agent future. They were the beginning of it.
Sources and further reading:
- Dwarkesh Patel, Andrej Karpathy — AGI is still a decade away
- TechCrunch, OpenAI co-founder Andrej Karpathy joins Anthropic’s pre-training team
- Simon Willison, Andrej Karpathy: AGI is still a decade away
Watch more first-principles field guides on Wiki4What, or read the essays at blog.wiki4what.com.