Frontier vs Efficiency
why bigger might not mean better...or more effective...soon
The Efficiency Era: Why the AI Race Won’t Be About Benchmarks
Related to my prior post, I believe there’s a shift happening that builders and companies aren’t really thinking about today but will be forced to soon.
Over the last year, everything has been about adoption and usage. “How can I get my team to use AI,” “how can we create systems that support safe and high quality AI enabled work” etc.
These pursuits have made sense in the context of the reality of the time. We’re likely somewhere near an inflection point for “everyday” users. AI companies still subsidize usage in an attempt to create leverage and switching costs, enterprises continue to experiment broadly, and the implicit message often still is: get in, figure it out later.
The tokenmaxxing signal from the likes of Uber, Amazon, Meta, etc. are showing what’s coming next. While frontier models got all the attention, the shift is now going to be focused on real measurable efficacy.
Obviously there will be exceptions where the power of a Fable level model is essential, but my hypothesis at its core is that the winning companies will be those who harness the right model for the right task, in the right system and keep their switching costs low.
Counter to Sam Altman’s view that OpenAI should be viewed as a utility that we’re all billed for in perpetuity, I think things will go in a different direction, or at least I hope they will.
The sustainability problem is real
The raw dollar cost of frontier models might be the focus of this piece. Fable, by Anthropic’s own policy defaults to consuming 2x faster than Opus 4.8, an already extremely “hungry” model. The results are powerful (or were until it was pulled).
However, other concerns such as power consumption, water usage, and compute cost aren’t abstract concerns anymore, as they’re showing up in infrastructure budgets, sustainability reports, and board conversations. These aren’t problems that scale away. If anything, unchecked AI usage makes them worse. Maybe data centers in space will be the answer. Maybe not. Another post for another day.
The same tension is playing out internally. Teams with broad, undisciplined access to expensive frontier models are generating a lot of AI activity. Whether that activity maps to business outcomes is a different question that hasn’t really been answered.
Or, to some degree, it has. Budgets are tightening and scrutiny on “is this really doing anything good for us” is increasing.
Just like everything else in time, the “free pass” is going away because of real costs.
Lock-in is fake.
AI providers are betting on lock-in. The theory is that once enterprises go deep on a platform, the switching costs keep them there regardless of price changes. That may be true at the infrastructure layer. It’s also true at the hardcore workflow level to some degree. I do not believe it is true at the model / harness level per se.
Additionally, tasks like summarization, classification, drafting, analysis, routing etc. do not require the most capable model that exists. You need a model that’s capable enough, fast, secure, and cheap. You also need it to be where you’re doing your work already, where you’re collaborating. Ideally not “another tool.”
Perhaps one but not a half dozen. And then there’s the theory that as the cost of building goes down that Anthropic will just build everything and that every piece of software on the internet will magically be replaced. I would venture to say that the quality of the experience and the ability to seamlessly collaborate will win out over one provider eating the world.
The companies that built adoption strategies around “use the best model for everything” are going to have a reckoning when the subsidies roll back. The companies that built a framework for matching model capability to task requirements are going to be fine.
Measurement is still underrated.
Before you can optimize AI usage, you have to be able to measure it. And most organizations can’t. They can measure token utilization, but rarely measure where it gets them.
Not because the tooling doesn’t exist, but because the foundational work hasn’t been done. Operational processes are poorly documented. Baseline metrics don’t exist. There’s no clear definition of what “better” looks like, so there’s no way to know whether any of the AI investment is actually working. Sounds familiar if you’ve ever come into an organization and tried to drive any type of change. Those problems didn’t magically go away as AI became embedded in work.
Simply, you can’t improve what you can’t measure and in order to not be left behind, companies are (often) spending faster than they’re learning.
How to address this in practice
The companies who get this right will treat AI enablement the way they treat any serious capital investment, with governance. Yuck! But, that doesn’t have to be a four letter word that elicits visceral reactions. It’s going to give us an opportunity to really rethink enablement in a way that values:
A clear framework for evaluating which tasks warrant which model tier
Cost attribution at the team or workflow level, not just aggregate spend
Outcome measurement tied to business metrics, not just usage volume
Regular audits, the same way you’d audit any significant vendor relationship
Having the right leader and team working through this will make literally all the difference in the world. Either teams will hate this, cry foul, etc. or they won’t really have to think about it at all. It’ll “just happen” because you’ve designed an incredibly effective system for making it work in a way that feels useful to teams.
This likely means a new role or set of roles emerge:
There’s a category of work that doesn’t really exist yet at most companies: the people who sit at the intersection of model capability, business process, and measurement.
Not prompt engineers. Not ML engineers. Something more like an AI program architect; someone who understands the model landscape well enough to make intelligent tradeoffs, understands the business well enough to design for actual outcomes, and can build the measurement infrastructure to know whether it’s working.
Over time, these roles will likely absorb a lot of what traditional eng ops and product ops teams do today. The organizations building these capabilities now are going to have a significant head start.
How does this all happen
We’ll likely see one of two archetypes emerge. Treating this period of time as a serious investment in capability, doing the foundational work, building the measurement frameworks, getting disciplined about where to use what and how and for what, and those who don’t do those things and either give up or keep writing checks for things they can’t really see.
The second group is going to hemorrhage cash on diminishing returns. They’ll have the AI spend line on the budget and nothing concrete to point to for it. And increasingly, the questions that will be asked will be hard to answer without the infrastructure to answer them in place.
The first group will have compounding advantage built not on having the best models, but on knowing how to use them.
There’s probably a few companies that spin out of this too as their core competency that go well beyond what LangChain or OpenRouter does today but build on a similar foundation.

