The Web's First Toll Booth for AI

Sharon Gai
2 days ago
7 min read

The web is about to change y’all.

It’s no surprise that AI has drastically changed the way we search for information, but the main business model of the internet which is - you publish content, and in return for being crawled and indexed, you receive traffic and make money through ads – is definitely changing. When Sergey and Larry built Google, their goal was to get you off Google.com, to your answers as fast as possible and that was directing you to the blue links.

Then came generative AI.

Tools like ChatGPT, Claude, and Perplexity now routinely scrape and summarize content across the web. The CEO of Cloudflare says “with OpenAI, it's 750 times more difficult to get traffic than it was with the Google of old. With Anthropic, it's 30,000 times more difficult.”

In 2025, 75% of queries to Google get answered within Google, no blue link needed. Google AI overview took over. That means for only 25% of the time are people actually going to a publisher website. Publishers are bleeding money and wondering what is going to be the future of their businesses.

LLMs smothers traffic to publishers

As users ask questions directly to these models and get full, fluent answers, they start trusting the LLM more. I remember in 2022, when we first tested LLMs, we talked about hallucination and double-checking answers. Throughout these three years, as LLMs improved, the need to double check answers have gone down. Hallucination has also come down. And back then, when we complained, the LLMs started to write footnotes and add sources to the answers. This is what gave Perplexity its first bundle of users. They would provide answers with citations. As time went on, users started to click less and less on them, taking the answers they’ve already received for granted. Users are lazy and like to operate with less friction. It’s no surprise that we start to rely more and more on answer engines to provide us with the truth. Publishers lose pageviews, ad revenue drops, and bloggers find their content rephrased and re-served without context or credit. The old exchange of “content for traffic” is breaking down.

The below image is from Cloudflare’s blog repository on how many times they detected how much traffic was generated by an AI agent. Look at the orange line that represents the GPTBot. It’s really gone up in recent months. (I thought an interesting bot here with significant traffic is Bytespider, which is the bot from Bytedance. Bytedance is the mother of Tik Tok. It wasn’t exactly crawling the web for Tik Tok videos. Bytedance has its own suite of AI tools and answer engines that this bot was likely crawling for its main Chinese user base.)

Cloudflare’s CEO recently in Cannes Lion said: “If the Internet is going to survive the age of AI, we need to give publishers the control they deserve and build a new economic model that works for everyone—creators, consumers, tomorrow’s AI founders, and the future of the web itself,” said Matthew Prince, co-founder & CEO, Cloudflare. “Original content is what makes the Internet one of the greatest inventions in the last century, and we have to come together to protect it. AI crawlers have been scraping content without limits. Our goal is to put the power back in the hands of creators, while still helping AI companies innovate. This is about safeguarding the future of a free and vibrant Internet with a new model that works for everyone.”

Here is the original video.

He thinks that the internet is ending because current economics will ultimately disincentivize content creators to create original content. I mean, it is happening already. Most publishers are rewriting the same story, changing the headline to be more sensational for clicks. But he thinks that the AI companies should be equally worried, because their results depend on the quality of the content. And if there is no more original content left, the answers will start to deteriorate in quality.

The Traffic Cop of the Internet

Enter Cloudflare, this company that mostly runs within the plumbing of the internet. If you have clicked on this thing before, you will know of this company!

Cloudflare is a company that sits quietly behind a large portion of the modern internet, ensuring that websites load quickly, stay online, and remain secure from attacks. Its core business is internet infrastructure: it provides tools that help websites and applications become faster, safer, and more reliable. Whether you're visiting a news site, shopping online, or logging into a workplace app, there’s a good chance Cloudflare is quietly helping to make that experience seamless.

(I used to sit on a team that provided bandwidth (transit) to Cloudflare! This is us at a team lunch)

At the heart of Cloudflare’s offerings is its global content delivery network, or CDN. This network spans hundreds of cities worldwide and works by caching content close to users, so when someone in Tokyo or Paris accesses a website hosted in New York, it feels instantaneous. You can imagine them to be like the traffic cops of the internet. Imagine the internet as a massive, chaotic highway system. Every second, billions of data packets (like tiny digital cars) are zipping between servers, browsers, and apps. Some are legitimate commuters (your Google searches, Zoom calls, or Instagram scrolls), but others are reckless drivers — malicious bots, hackers, spam, or attackers trying to crash the system.

Cloudflare’s job is to stand at the intersections and direct that traffic efficiently and safely. It makes sure the good traffic flows through quickly and securely, while stopping or rerouting the bad actors. If a hacker launches a DDoS attack, Cloudflare can detect the flood of malicious traffic and absorb it before it reaches your site. If someone’s trying to exploit a security flaw on your server, Cloudflare’s firewall stops them at the edge. And for legitimate users, Cloudflare helps speed things up by caching content closer to them — like opening express lanes on the digital highway.

And now Cloudflare is getting into the AI game.

Pay per crawl

Cloudflare’s new product aims to rebalance that relationship. Leveraging its massive presence—serving over 20% of all websites—Cloudflare will now give sites the power to block AI crawlers unless they pay. It works through a combination of AI-detection tools, network policies, and a honeypot system called “AI Labyrinth” that traps unauthorized scrapers. AI companies that want access to Cloudflare-hosted content will have to license it—either through direct payment or partnerships.

This is more than a technical adjustment. It’s the beginning of a new ecosystem where content isn’t just free data for training; it’s a commodity with a price tag. In many ways, this mirrors what happened in the early 2010s, when publishers like The New York Times began introducing paywalls after years of giving away journalism for free online. At first, this was controversial—why restrict access to news? But eventually, it became the norm. Today, subscriptions have replaced advertising as the dominant revenue stream for many major publications.

Now, the AI era is ushering in a similar transformation. Except this time, the paywall isn’t for humans. It’s for machines.

But will it work?

The answer depends on adoption. If companies like Akamai, Fastly, and large content delivery network companies join Cloudflare in enforcing pay-to-crawl policies—and if major content producers like news sites, forums, and educational platforms comply—AI companies could soon find themselves facing a fragmented web, where much of the “good” data is off-limits unless they pay.

A perfect analogy is vaccinations. Remember good old Covid? The behavioral economics behind that phenomenon is so relevant to so many business cases.

Imagine there's a contagious virus spreading, and only some people in a country get vaccinated. If just a few people get the shot, the virus still spreads. But if everyone gets vaccinated, the virus has nowhere to go — and the whole country is protected.

This is called herd immunity — it only works if enough people participate. Companies like Fastly and Akamai are also CDNs (Content Delivery Networks) so if they participate too, most of the internet will be participating, leaving AI companies starving for good content. Once the quality of their answers wane, their users will drop. Remember when ChatGPT first launched and its information was only updated to a certain date? If we searched for anything current, the response would be: sorry, my training data is only up until a certain time frame. That was annoying. Those were the times when I went back to good old Google. That is what will happen if most of the internet participates. And AI companies do not want to lose their hard-won users.

On the other hand, if enough free or unprotected content remains accessible like open-source blogs, public forums, and global sites, then AI models might continue to train and respond using “good enough” data. For now, many AI systems are still scraping whatever they can get, often relying on a mix of licensed, public, and questionably sourced content.

Should you join the gated section of the web?

So what does this mean for the average blogger, newsletter writer, or content creator?

It comes down to your goals.

If you’re running a personal blog or trying to grow an audience, allowing AI crawlers might still serve you. Being discoverable—even if it’s via a chatbot answer—can bring exposure, backlinks, and brand awareness. At this stage, visibility could be more valuable than control.

But if you’re website is already a well-known brand, you have stable traffic, and your content is a form of intellectual property, it may be worth thinking like The Atlantic or NYT. If AI tools are using it to answer questions that used to bring users to your site, that’s a loss—not just of traffic, but of trust, authority, and revenue. You might not have the negotiating power of a large media outlet, but you can still take steps to protect your content from unauthorized scraping.

This is where tools like robots.txt and Cloudflare’s AI-specific settings come in. A simple line of code can tell GPTBot, ClaudeBot, and others not to crawl your site.

While not foolproof—bad actors can still ignore these files—most reputable AI companies do honor them. And even if you're on platforms like Medium or Substack, where you can’t set a robots.txt file yourself, you can choose whether to make posts public or gated. Substack, for example, allows you to set posts to “subscribers only,” which effectively hides them from AI crawlers.

In the end, the internet is splintering—not just between paywalled and open content, but between what humans can see and what machines can use. Cloudflare’s new product doesn’t just offer a technical solution. It marks a turning point in the economics of the web.

As a content creator, you now have a choice: open the gates for AI in hopes of exposure, or build your own fences in hopes of compensation. Either way, the era of unregulated data scraping is coming to an end—and the AI paywall era has begun.

The choice is yours.

The Web's First Toll Booth for AI

Recent Posts

Comments