WrSySK6bZXfL5bdzWivmyV
top of page

The Token Bill Came Due This Week, and It Is Bigger Than Anyone Budgeted

Summary: AI got cheaper per token and more expensive overall at the same time. This week the industry hit a full cost panic: Uber blew through its 2026 AI coding budget by April, Microsoft pulled some Claude Code licenses, JPMorgan warned that token costs are eating internet profits, and a large share of AI agent startups are projected to run out of cash by late 2026. Cheaper unit cost does not mean cheaper bills. Here is how to keep AI on a budget instead of letting it eat your margins.

For two years the AI story was about things getting cheaper. This month the story flipped, and most leaders have not caught up to it yet.

TechCrunch reported on June 5 that the token bill is coming due across the industry, and the examples are striking. Uber blew through its entire 2026 AI coding budget by April. Microsoft revoked Claude Code licenses from some of its own developers months after handing them out. Companies reported running three times over their full-year token budgets by spring. This is not a startup problem. This is some of the most sophisticated technology operations on earth losing control of a line item.

The trap: cheaper per token, more expensive overall

Here is the part that catches smart people off guard. The price of a single token keeps falling. Anthropic, OpenAI, and Google have all cut per-token prices repeatedly. So leaders assume their AI bills should be falling too. They are not. They are climbing, often steeply.

The reason is simple once you see it. Usage is growing faster than price is falling. Every time the models get cheaper and better, teams use them for more things, run them on longer tasks, and let agents loop through more steps. A modern AI agent can burn through enormous volumes of tokens on a single job, because it reads context, plans, routes, calls tools, checks its own work, and repeats. The newest models are built to keep working autonomously for hours, which is wonderful for output and terrifying for a budget if nobody is watching the meter.

The macro numbers back this up. JPMorgan published a note titled, with unusual bluntness for a bank, AI token costs are eating internet profits alive. Companies including Shopify, Spotify, ServiceNow, and Roku have reported AI surging as a share of operating expenses. The writer Derek Thompson called this the great AI cost panic of 2026, the moment the boom entered its wait, is this actually worth it phase.

The startups are the canary

The clearest warning sign is what is happening to AI agent startups. A significant share of early-stage agent companies are projected to exhaust their cash reserves by late 2026 because running multi-agent systems in the real world is so token-intensive. Continuous context usage, orchestration routing, and enterprise integration testing drain venture funding faster than these companies can grow revenue. Venture capitalists are reportedly advising portfolio companies to brace for a correction even while the press releases shout about record valuations.

It is worth holding two facts side by side. At the top of the market, four of the five largest venture rounds in history closed in early 2026, with OpenAI, Anthropic, xAI, and Waymo together absorbing the majority of global venture dollars. At the bottom, a wide field of agent startups is quietly running out of runway. That is not a healthy, evenly distributed market. That is capital piling into a few compute-rich giants while everyone downstream gets squeezed by the cost of the very compute those giants sell.

Why this is actually a token economy story

I have argued for a while that computation is becoming the basic unit of business production, replacing the labor hour. The cost panic of 2026 is the confirmation, just from the painful side. When your fundamental input is compute, your fundamental risk is compute cost. Companies that used to worry about salary inflation now have to worry about token inflation, and unlike salaries, token usage can spike overnight when an engineer points an agent at a big job and walks away.

The companies that handle this well will treat tokens the way mature companies already treat cloud spend. In the early cloud era, plenty of firms got walloped by surprise bills because anyone could spin up infrastructure and nobody owned the total. The discipline that fixed it was boring and effective: budgets, alerts, ownership, and regular review. AI compute needs the exact same governance, and almost nobody has built it yet.

What to do before your next budget meeting

You do not need to panic. You need to put a meter on the thing. Four concrete moves.

First, set a hard token ceiling per team, the same way you would cap a cloud budget, and put a name next to that number. Unowned budgets are the ones that explode.

Second, separate experimentation spend from production spend. A lot of runaway cost comes from open-ended tinkering that never gets a limit. Give experiments a sandbox with a fixed cap and make production usage justify itself with an outcome.

Third, measure outcomes per token, not just tokens consumed. The goal is not to spend less on AI. The goal is to know what each dollar of compute is buying you. A team spending heavily and shipping enormous value is fine. A team spending heavily on agents that loop endlessly and produce little is the problem, and you cannot tell them apart without the metric.

Fourth, build the governance now, while it is cheap. The firms that got burned this spring were not reckless. They were just early, with no controls in place when usage took off. You have the advantage of their hindsight.

The cheaper-per-token headline was always a half-truth. The full truth arrived this week in the form of budget overruns at Uber, license clawbacks at Microsoft, and a startup graveyard forming on the horizon. AI is not free, it is not even cheap at scale, and the companies that survive the cost panic will be the ones that put a number on it before the number put a hole in their margins.

So I will leave you with the question I would put to your own finance and engineering leads this week. If your AI usage tripled tomorrow, would anyone in your company notice before the invoice arrived?

Sharon Gai is an AI transformation strategist, keynote speaker, and author of How to Do More with Less Using AI. She advises Fortune 500 companies on AI adoption and organizational redesign.

Recent Posts

See All

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
bottom of page