Strategy

AI-Native Engineering Is Boring (That's the Point)

Everyone has access to Claude Code and Cursor. The gap between teams getting extraordinary results and teams getting mediocre ones isn't the tool. It's the boring infrastructure underneath.

Haroon Choudery · 10 min read

Boris Cherny, the creator of Claude Code, revealed his workflow in January. He runs 15 parallel Claude instances simultaneously. He ships over 100 pull requests a week. Developers lost their minds.

The General Intelligence Company reported engineers shipping 20 PRs per day, a 3-4x output increase, spending $4,000 per engineer per month on Opus tokens. Andrej Karpathy reframed his “vibe coding” concept into something more disciplined: “agentic engineering.” Addy Osmani drew a sharp line: vibe coding is YOLO, agentic engineering is where AI does the implementation and the human owns the architecture, quality, and correctness.

The excitement is real. And it’s deserved.

But here’s what I keep seeing when I work with companies trying to replicate those results: they can’t. Not because the tools don’t work. The tools are incredible. Everyone has access to Claude Code, Cursor, Windsurf, the same models, the same agent harnesses. The gap between teams shipping 100 PRs a week and teams struggling to get a single agent workflow running reliably has almost nothing to do with which tool they picked.

It has everything to do with what they built underneath it.

The excitement is real. It’s also unevenly distributed.

I don’t think the hype around agentic engineering is premature. The tools are genuinely powerful. What’s happening is that the excitement is unevenly distributed.

Some teams are getting real results. They post about it. The numbers go viral. Other teams see those numbers, adopt the same tools, and then struggle to make it work consistently across different initiatives.

The gap isn’t the tool. It’s the invisible infrastructure.

The teams getting extraordinary results have invested in things that don’t make for good tweets: clean context systems, review discipline, architectural clarity, and organizational buy-in for a fundamentally different way of working. The teams struggling skipped all of that and went straight to the shiny part.

This is the pattern I keep running into. A team sees someone build a working agent system in 20 minutes and thinks the answer is the agent. The answer is everything that person already had in place before they opened the terminal.

The boring 80%

Here’s the part nobody wants to talk about.

The exciting part of agentic engineering, agents writing code, composing tools, solving problems, is maybe 20% of the work. The other 80% is documentation. Evals. Context management. Architecture. Keeping your knowledge base clean and current. Making sure your systems don’t contain stale information or conflicting sources of truth.

That’s not a fun conference talk. It doesn’t go viral on X. But it’s the whole game.

Arvind Jain, the CEO of Glean, published a piece today laying out a six-layer enterprise agent stack. Security, context, models, orchestration, agents, interfaces. The key insight: context is the foundation layer. Without a solid context layer (connectors, indexes, knowledge graphs, context graphs), agents produce what he calls “work slop.” Each agent run should feed back into the context layer, creating a compounding loop. But if the foundation is broken, nothing compounds. It just produces more noise.

This maps directly to what I see in practice. The context layer is the bottleneck, and almost nobody is talking about it with the seriousness it deserves.

What broken context actually looks like

When I walk into a company and look at their AI systems, the first thing I examine isn’t the model, the prompts, or the agent architecture. It’s the context layer.

And it’s almost always broken.

Here’s what I keep finding.

Multiple sources of truth. Three wikis, two Notion workspaces, a Google Drive, a Confluence instance. None of them agree with each other. The agent pulls from whichever one it finds first, and the answer depends on which source it hit. The model isn’t hallucinating. It’s faithfully reporting what your contradictory documentation says.

Stale documentation. The knowledge base was last meaningfully updated six months ago. The product has shipped 40 features since then. The agent is confidently answering questions about a product that no longer exists.

No structured context at all. Teams paste things into prompts ad hoc. No AGENTS.md files. No systematic approach to what the agent knows about the project, the codebase, the business domain. Every session starts from scratch. Nothing compounds.

Wrong granularity. Either everything gets dumped into context (the “dumb RAG” problem that Composio identified as the number one reason agent pilots fail) or the agent gets so little context that it has no idea about the business logic it’s supposed to be operating on.

These aren’t model problems. These aren’t tool problems. These are information hygiene problems. And they existed long before AI agents came along. Agents just made them impossible to ignore.

Gartner predicted that 60% of AI projects would be abandoned due to data that wasn’t ready for AI. A recent survey found that 81% of AI professionals say their company has significant data quality issues. And 85% say leadership isn’t addressing it.

The bottleneck isn’t the model. It’s what you’re feeding it.

The misconception about powerful models

There’s a belief I encounter constantly: these models are so good that they can handle whatever context you throw at them.

They can’t. And they won’t be able to for a long time.

Throwing everything at a model is neither efficient nor effective. Even with million-token context windows, the quality of what goes in determines the quality of what comes out. More context doesn’t mean better results. Better context means better results.

Karpathy called this context engineering: the art and science of filling the context window with just the right information for the next step. Too little and the model doesn’t have what it needs. Too much and you’re paying for irrelevant tokens while the model gets distracted by noise.

The teams getting real results from agentic engineering have figured this out. They maintain clean, structured project context. They keep their documentation current. They’ve thought carefully about what the agent needs to know and, just as importantly, what it doesn’t.

This is boring work. It’s the same discipline that good engineering teams have always practiced: keep your docs up to date, maintain a single source of truth, don’t let your knowledge base rot. Agents didn’t invent this need. They just raised the stakes.

The tools aren’t the bottleneck

There’s an entire industry of content dedicated to comparing agent tools. Cursor vs. Claude Code vs. Windsurf. Which IDE. Which model. Which framework.

These comparisons aren’t useless, but they’re solving for maybe 5% of the problem.

Cursor has 2.1 million users and hit $1 billion in ARR. NVIDIA moved 40,000 engineers to Cursor workflows. GitLab launched its Duo Agent Platform. Google published formal multi-agent design patterns. The tooling is mature. Access is not the problem.

The problem is that people pick a tool and expect results without investing in the infrastructure that makes the tool effective.

It’s like buying a professional camera and expecting great photos. The camera matters. But the photographer’s eye, the lighting setup, the hours of practice with composition, that’s 95% of the result. Anyone can buy the camera. Not everyone will do the boring work that makes the camera useful.

What actually matters (and why that’s good news)

If the bottleneck were model quality, you’d just have to wait for the next model release and hope. If the bottleneck were tool access, you’d need to pick the right one and switch if you’re wrong.

But the bottleneck is context management, review discipline, and architectural clarity. These are engineering fundamentals. They’re learnable. They’re practicable. They’re not gated by which vendor you chose or how much you’re spending on tokens.

Context management means maintaining clean, current, non-contradictory knowledge that your agents can actually use. It means AGENTS.md files that give agents the project context they need. It means structured documentation, not pasted-in snippets.

Review discipline means not shipping whatever the agent produces. Addy Osmani said it best: the single biggest differentiator between agentic engineering and vibe coding is testing. The teams getting real results still review rigorously. They have evals. They treat agent output the way a senior engineer treats a junior engineer’s code: assume it’s probably good, verify that it actually is.

Architectural clarity means knowing where agent-directed flexibility adds value and where deterministic code is the right answer. It means well-defined tool interfaces, clean abstractions, systems designed to communicate with each other instead of a pile of one-off agent scripts that can’t talk to anything.

You can get by without overthinking architecture for one-off projects. But the moment you try to scale, if the systems aren’t constructed thoughtfully, you end up with a lot of slop. Different systems that can’t communicate. Bloated code. Inefficiency everywhere.

Greg Brockman acknowledged this in his internal retooling announcement at OpenAI: “say no to slop.” Maintain at least the same code review bar as you would for human-written code. This is not just a technical change but a deep cultural one.

The irony, as Osmani pointed out, is that AI-assisted development actually rewards good engineering practices more than traditional coding does. The fundamentals matter more now, not less.

What this means for your team

Just because writing code is getting easier doesn’t mean building software is getting easier. It shifts the importance to new disciplines.

The engineer’s job is changing. You spend less time typing code and more time specifying intent clearly, reviewing output rigorously, maintaining the context layer, and designing architecture that holds up when agents are doing the implementation. The compound engineering methodology (plan, work, review, compound) is one framework for this. The key idea: every task should make the next one easier through documented learnings. That’s a context discipline, not a tool feature.

If your team is adopting agent tools and not seeing the results you expected, the answer probably isn’t a different tool. It’s probably one of these:

Your documentation is stale or contradictory. Your agents don’t have structured project context. Your team is shipping agent output without adequate review. Your architecture wasn’t designed for the way agents work.

None of these are exciting problems. All of them are solvable.

Setting the stage

In my opinion, the most important problem to solve right now isn’t “which agent tool should we use.” It’s the context architecture problem. How do you set up your systems so that agents have clean, current, well-structured information to work with? How do you eliminate conflicting sources of truth? How do you make sure the knowledge your agents rely on doesn’t go stale the moment someone ships a new feature?

This is the heavy lifting. And once it’s done, everything else gets dramatically easier. The tools become effective. The workflows start compounding. The results that looked impossible start to feel inevitable.

This applies whether you’re an engineering team trying to ship with Claude Code, a product team trying to integrate agents into your workflows, or an operations team trying to automate processes that currently live in someone’s head. The common denominator is always the context layer.

Companies are starting to figure this out. The ones building products in this space, whether they say it explicitly or not, are really solving the context infrastructure problem. But most companies don’t need a product. They need someone to come in, understand their specific context architecture, and set the stage for everything else to work.

That’s what I spend most of my time doing these days.

AI-native engineering is boring. That’s the point.

The exciting demos are real, but they’re the tip of the iceberg. Underneath is a mountain of context management, review processes, documentation hygiene, and architectural thinking that nobody tweets about.

The good news: this is all learnable. It’s engineering discipline, not magic. The companies winning right now aren’t the ones with the best tools. They’re the ones doing the boring work that makes those tools effective.

Everyone has the camera. The question is whether you’ll do the work to learn how to use it.

If your team has the tools but isn’t getting the results, the problem is almost certainly underneath. Context architecture, review processes, and system design are exactly what Seeko helps companies figure out. We do the heavy lifting so your team can actually make the most of AI.

Thinking through the same questions?

We help companies figure out what AI actually changes for their business — and build the systems to act on it. Strategy, architecture, automations.

Tell us what you're working on →