By Jason Jones
Everybody keeps talking about the AI models: the big names like Claude, ChatGPT, and Gemini. Their model is bigger, smarter, faster, more precise, more reliable… Fair enough. That’s the shiny bit. It’s what gets headlines, Investors frothing, keynote slots, breathless LinkedIn posts, the whole circus show.
When you’re actually trying to make AI useful inside a company (and I mean useful in the boring, accountable, budget-owning sense) you have to look at the AI Architecture, sometimes called the AI Harness, around it. This is where the context handling, tool orchestration, memory, planning loops, governance, and guardrails, are built. How information gets packed, passed, trimmed, reused, or, frankly, sprayed everywhere for no reason. That layer matters more than most people want to admit. The model is not the whole story. Honestly, it may not even be the main story.
But an AI Model isn't what actually runs your business. It's just a refinery, distilling large language models into material that can be harnessed by your business. In fact, to use an old-fashioned gasoline car metaphor...
LLM AI Models
Claude, ChatGPT, and Gemini have Data-Centers that are like refineries that make fuel for a car.
AI Tokens
So these models make fuel, and you buy it by spending tokens (“gas” to move your car).
Your Business
The car? In this metaphor, that’s your business, what drives you forward.
AI Architecture
And the AI Architecture is like the engine that works to use “gas” (tokens) efficiently.
The model tokens are your gas. The AI Architecture is your engine.
The model is not the product people think it is
The gas gets burned. Every prompt, every retry, every planning loop, every tool call, every oversized context window, every unnecessary instruction blob somebody forgot to trim three product cycles ago. Fuel, fuel, fuel, burn, burn, burn.
So two companies can use the exact same underlying model, but end up delivering wildly different economics.
One system hums along, gets the job done, doesn’t waste much, feels tight. The other one chews through tokens like a badly tuned F150 truck hauling cinder blocks uphill with a hole in its gas line. Same gas. Different engine. Different results.
The difference is the AI Harness and AI Architecture. They are the factors that decide whether that fuel actually turns into forward motion, or just smoke.
If two systems finish the same task and one burns five times more tokens doing it, that is not some deep mystery. It is not “emergent behavior.” It is not a sign that one model is spiritually more advanced.
Should Anthropic, Open AI, Google, or any of the other Frontier LLMs be the ones who build the most token-efficient harnesses for everyone else?
<cough>… Probably not.
So why wouldn’t Anthropic or Open AI fix this?
It’s not because they’re stupid, in fact it’s quite the opposite.
Anthropic, OpenAI, Google, whoever, these are full of smart people. Excellent engineers. Serious operators. This argument falls apart instantly if it sounds like “they just don’t get it.” They do get it, I’m sure many of them get it painfully well.
The issue is not capability, It’s incentive alignment. Or, more to the point, the lack of it.
A good Harness or Architecture has one job: get the outcome using as little fuel as possible. Less token waste. Fewer pointless loops. Smaller, sharper context windows. Cleaner handoffs. Better memory discipline. Better decisions about when to call tools and when not to. It’s engine design, basically. You want the same tank to take you farther.
But the Frontier LLM Labs are not just building engines. They’re also the ones financing the fuel infrastructure.
These companies are tied to giant compute commitments, giant infrastructure bets, giant expectations. Their economics improve when usage is large, sustained, and growing. So at a structural level, the customer is asking for better mileage, while the platform’s broader financial machinery benefits from fuel consumption.
It’s these directly opposing incentives that causes them to be pulling on opposite ends of the same rope.
This is not a conspiracy*. It’s just incentives doing what incentives do.
That’s the tension. It doesn’t require cartoon villain behavior. No one has to twirl a moustache. No one has to say, “let’s make this less efficient for our users so they burn more tokens…”
Uh… hmmm… (Okay, so they did make it burn more tokens.)
- June 2025 – Anthropic blocks the Windsurf IDE from accessing Claude
- March 2026 – Anthropic Forces OpenCode to Strip Claude Integration
- April 2026 – OpenClaw: The full OAuth lockout, with the transition to pay-as-you-go
That’s not how this stuff usually… (Did they just ban a bunch of useful tools & innovations?)
Companies just don’t go around and… (Wait, aren’t those just their version of the tools they just banned?)
*Okay, so it is a little bit a conspiracy.
If an independent AI Harness company finds a way to cut token consumption by half, that matters right now. Their margins get better. Their product gets better. Their sales story gets better. Their users get more actual work per dollar. The fix is not optional. It is the business.
Open AI, Anthropic, Google and the other Frontier LLMs live in a different reality. Its teams may care deeply about efficiency, and many of them plainly do, but the company itself is also under pressure to justify staggering compute commitments, capital deployment, and growth expectations. That creates a structural contradiction. One side wants leaner agents. The other side benefits from a world where usage keeps swelling.
Those are not the same job. Not even close.
Currently buyers can’t see the waste clearly
and that’s part of the trick
Most people aren’t billed in a way that maps cleanly to what a single model session run actually consumed. They pay for a subscription tier, hit a cap, get throttled, or run into some vague usage window, and that’s that. The system conceals its own inefficiency behind abstraction. Convenient abstraction, sure. Still abstraction.
So when a session balloons in size because the harness keeps reloading context, restating constraints, shuttling the same information through multiple layers just like a kid hauling every toy into the living room. The user just feels the symptoms. The slowdown. The cap. The weird brittleness. The sense that something is chewing through budget faster than it ought to.
It’s like renting a car without a fuel gauge, then finding out halfway through the trip that the tank somehow vanished faster than it should have. You know something’s off. You just can’t point to the exact leak.
This is why the harness matters more than most people think
A lot of buyers still assume the model is doing nearly all the heavy lifting. That made sense earlier, maybe. It makes less sense now.
At this point, the harness is often the difference between a demo and a dependable system. Same underlying intelligence, totally different business result. Different cost structure. Different user confidence. Different level of operational trust. People are starting to realize that the value isn’t only in the model anymore.
The Vertical Motion team and I ended up building our own native AI harness and architecture for exactly that reason. Once long-running tasks started exposing the limits of the default tooling, it became hard to ignore how much waste was hiding in the workflow. And once you stop optimizing for maximum model usage, you begin to notice all kinds of nonsense.
Chew your food
Files should be pre-processed and broken down into layers, before they ever touch session context. (Break things down into layers of importance. As you need more accuracy on the topic, then load up what is relevant)
Use sensible shortcuts
Recurring entities should have shorthand instead of being reintroduced over and over. (If you keep using the same data, or workflow, break it out into a skill or tool)
Stay efficient
Skills should load on demand, and not the entire directory (that's like reading every user manual for all the appliances and electronic devices in your house, when you are just going to use the microwave).
Build steady loops
Remove frailty and increase reliability with guardrails inside the session. (Hooks are your best friend, use them)
Learn from your mistakes
Each and every mistake gets recorded, every lesson learned gets fed back into the system to improve upon in the next cycle.
None of this is wizardry or some moonshot science. It’s discipline. It’s system wide thinking. It’s somebody caring enough to say, “No, that does not need to consume fuel.” Then saying it again. And again. And again.
A better harness usually doesn’t come from one grand breakthrough. It comes from dozens, maybe hundreds, of sober little decisions that keep the engine from wasting gas.
That kind of work is not glamorous, but it sure does win.
The frontier LLM labs see the harness opportunity. Of course they do.
They know the value chain is shifting upward. They know the wrapper around the model is becoming its own category of advantage. They know the experience layer, the orchestration layer, the harness layer, whatever label you want to slap on it, is where a lot of the real product differentiation now lives.
They’re moving there because they have to.
Look at Anthropic for instance. They’ve shipped over 74 products between February and March 2026 – Claude Cowork, Claude Marketplace, Claude Code Channels… A tsunami of AI widgets, tools, and workflow products. All in the hopes that you will adopt their solutions instead of building your own.
But seeing the opportunity and being structurally positioned to pursue it in the customer’s best interest are not the same thing. Not even a little. Right now, Anthropic and Open AI are trying to build the engine while also selling the gas (tokens), justifying the refinery (datacenters), and explaining the capex to investors. That’s a strange posture. A conflicted one. One that doesn’t bode well for the consumer.
Realistically, real harness efficiency would start to push questions into the uncomfortable area of:
- What happens if the customer suddenly needs a lot less fuel?
- Which is excellent news for the customer (and the environment frankly).
- It is not automatically excellent news for the seller.
And that’s the hitch, right there. If left to Open AI and Anthropic, they would much rather we never get the opportunity to ask that question.
The takeaway, plain and simple
If you’re evaluating AI systems, stop asking only which model is smartest.
- Ask what kind of engine is wrapped around it.
- Ask what happens to your data and proprietary information when you use their engine.
- Ask how much fuel it burns to finish a real task.
- Ask whether the system is built for mileage or just for spectacle.
- Ask whether the vendor benefits when your costs go down, or mainly when usage goes up.
- Ask what incentives are shaping the product, because incentives always show up somewhere, even when nobody says the quiet part loud.
The market is still early enough that plenty of buyers are mistaking horsepower for efficiency.
That won’t hold.
Sooner or later, everyone figures out the same thing: the gas matters, sure, but the engine decides whether you get across town or end up stranded on the shoulder wondering where the tank went.
And the companies selling the gas are not the ones best suited to build the most efficient engine.
Vertical Motion is a trusted Canadian software development and entrepreneur assistance company that has supported the global efforts of startups, non-profits, B2B, and B2C businesses since 2006. With headquarters in Calgary and Kelowna, and team members coast to coast, Vertical Motion is recognized as an award-winning leader in the technology industry. Our team of executive advisors, project managers, software developers, business analysts, marketing specialists, and graphic designers have extensive experience in several industries including — Energy, Finance, Blockchain, Real Estate, Health Care, Clean Technology, Clothing & Apparel, Sports & Recreation, Software as a Service (SaaS), and Augmented & Virtual Reality (AR/VR).
Come chat with us and let us take you “From Idea to Execution and Beyond!” 🚀
