Cloudflare CEO Matthew Prince expected the crossover later. First it looked like a 2027 problem, then an early-2027 problem. On June 3, 2026, he said the line had already been crossed. According to reporting on Cloudflare’s latest data, bot and automated HTTP requests to HTML content were running at roughly 57.5% of the measured traffic, against 42.5% for humans. Prince described it as the first time bots had passed human traffic online in internet history.
Table of Contents
The traffic line that changed the web
That statement needs careful reading. It does not mean machines now spend more time online than people, or that every app session, streaming hour, message, map lookup, game lobby and social feed refresh is dominated by bots. The measurement discussed in the current reports is narrower and more technical: HTTP requests to web content, especially HTML pages, where automated agents, crawlers, scrapers and bots can generate huge volumes of rapid-fire requests. Humans still dominate many forms of digital activity that are not well represented by page-request counts, especially long app sessions and video consumption.
Even with that limit, the shift is not small. The open web has always had bots. Search engines crawled it. Monitoring services checked it. Fraud tools attacked it. Archive projects copied it. Price scrapers watched it. The difference now is scale, purpose and economic pressure. AI agents do not merely index pages for a search engine. They increasingly browse, compare, extract, summarize, test forms, check prices, enter accounts, and sometimes move through checkout flows.
The web was designed around a loose bargain. Publishers, merchants and service providers made pages available. Search engines crawled them, ranked them, and sent people back. Server costs were treated as part of distribution. Analytics treated visits as signals of demand. Ads treated pageviews as monetizable attention. Conversion funnels assumed that a request was likely to be attached to a person, or at least to a familiar machine such as a search bot.
That bargain is breaking. A page request is no longer a reliable proxy for human attention. A visit no longer means a reader, shopper, traveler or subscriber saw the page. A crawler may read the page, extract its content, feed it into a model, answer a user elsewhere, and send no click back. An agent may visit a retailer because a human asked a model to “find the best camera under $1,000,” but the retailer may see only a storm of product-page requests, not a normal shopper.
The Cloudflare milestone is not an isolated claim. HUMAN Security’s 2026 State of AI Traffic & Cyberthreat Benchmark Report says automated traffic grew 23.51% year over year in 2025, around eight times faster than human traffic at 3.10%. It also says monthly AI-driven traffic grew 187% from January to December 2025, while traffic from AI agents and agentic browsers grew 7,851% year over year.
Imperva’s 2025 Bad Bot Report reached a related marker from another angle: automated traffic accounted for 51% of all web traffic in 2024, while bad bots alone made up 37%. Cloudflare’s newer data sharpens the picture around HTML requests and agentic growth. HUMAN Security adds the commerce and cyberfraud angle. Together, they show that the web’s machine audience is no longer secondary.
The result is a strange inversion. Humans still create the demand, write the prompts, publish the pages, list the products and pay the bills. Machines now perform a growing share of the visiting, reading and retrieval. The internet has not lost its human purpose. It has lost its human traffic profile.
Cloudflare’s number is powerful but not absolute
The headline number is simple enough to travel fast: bots ahead of humans. The measurement behind it is not simple. Cloudflare Radar’s public bot section defines bot traffic as non-human internet traffic and tracks bot share across locations and sources. Cloudflare’s AI Insights page tracks AI bot and crawler traffic, including HTTP request trends for active AI bots and crawl-purpose categories.
Those tools matter because Cloudflare sits in a rare position. The company manages and protects traffic for a very large share of the public web, and its own press material says it helps manage and protect traffic for 20% of the web while handling trillions of requests daily. That makes Cloudflare’s view unusually broad. It is still not the whole internet. No single infrastructure company sees everything. App traffic, private networks, closed platforms, encrypted service-to-service traffic and traffic outside Cloudflare’s network are not fully captured by one provider’s lens.
The distinction between web traffic and internet usage is also crucial. A person watching a two-hour film may generate far fewer HTML requests than an AI agent checking thousands of pages in a few minutes. A human using a native mobile app may spend 45 minutes inside an interface without producing the same request pattern as a crawler reading product listings. A bot may generate a large number of HTML requests without consuming anything like human attention.
That is why the phrase “AI bots have overtaken humans on the internet” is both directionally true and technically slippery. It is true in the measured traffic category being discussed. It is slippery if read as a universal claim about all digital life. Machines have become the majority visitor to important parts of the open web, not the majority owner of human attention.
The same issue affects business analytics. A publisher may see page requests rise while ad revenue falls. A retailer may see traffic rise while human conversion declines. A travel site may see search-result pages hammered by agents while actual bookings remain flat. An analytics dashboard built for people can make a site look healthier while the business behind it gets weaker.
Cloudflare’s number should be treated as a warning signal, not as a complete census. It tells website owners that their traffic mix has changed so much that the old default assumptions are unsafe. It does not tell every organization exactly how much of its own traffic is human, benign automation, malicious automation, AI search, training crawl, agentic retrieval, monitoring, fraud, credential attack or synthetic testing.
That local measurement problem is the next operational fight. Global bot share is useful for headlines. Site-level intent classification is what businesses need to make decisions. A media company needs to know whether a crawler is training a model, powering a live answer, checking a source for a user, violating a license, or attempting credential stuffing. A retailer needs to know whether an agent is helping a real customer compare prices or running an inventory-hoarding attack. A bank needs to separate legitimate customer automation from account takeover.
Cloudflare’s milestone is credible because it aligns with other reports, but its business meaning depends on what a given site sees in its logs. The web has entered a period where aggregate traffic numbers can hide more than they reveal. A million requests may mean a million readers, a thousand agents, ten abusive scrapers, or one badly configured botnet. The unit of analysis has moved from the visit to the actor and the intent.
Agentic traffic is different from old bot traffic
Old bot traffic had familiar categories. Search crawlers indexed pages. Feed readers checked updates. Monitoring services tested uptime. Bad bots scraped content, spammed forms, launched credential attacks or probed for vulnerabilities. They were annoying, useful, dangerous or tolerated, depending on the case. But most website owners could understand the pattern.
Agentic traffic complicates that picture because it may represent a real human purpose without a human browsing session. A user asks an AI assistant to compare flights. The agent checks airline sites, online travel agencies, review pages, route data and fare conditions. A human might have opened five to ten pages. The agent may open hundreds or thousands. Prince made that point at SXSW when he said an agent shopping for a digital camera could visit a thousand times more sites than a human would for the same task, creating “real traffic and real load.”
That multiplier is the heart of the new web. A person delegates intent. The agent expands that intent into machine-scale retrieval. The website sees the retrieval, not the person. The human remains the source of demand, but the agent becomes the visible user.
This creates three immediate problems.
The first is cost. Each request consumes infrastructure: bandwidth, compute, cache, origin capacity, database queries, bot detection, logging and security processing. Some AI traffic hits cached pages cheaply. Some strikes origin servers expensively. Some bypasses normal caching because it queries dynamic pages, search endpoints or authenticated experiences. Even benign traffic can become costly when multiplied by agent behavior.
The second is attribution. The old search model gave publishers a reason to accept crawling because crawlers brought back human readers. AI systems can ingest content or retrieve it for answers without sending comparable traffic back. Cloudflare’s crawl-to-click analysis found training crawling grew as the dominant purpose in AI bot activity and documented large imbalances between pages crawled and visitors referred back for some AI operators. When extraction rises and referral weakens, the publishing bargain changes.
The third is control. A site may want Google Search indexing but not AI training. It may want live citation access for AI answers but not bulk scraping. It may want a customer’s personal shopping agent to read product pages but not allow a competitor’s pricing bot. It may want accessibility tools and research crawlers but block fraud automation. Traditional robots.txt was not built for this level of commercial and semantic distinction.
OpenAI’s crawler documentation now distinguishes between robots and user agents used for product actions, including GPTBot and OAI-SearchBot controls. OpenAI says the settings are independent, so a webmaster can allow OAI-SearchBot for search results while disallowing GPTBot for training. Anthropic’s support documentation explains how site owners can use robots.txt to block ClaudeBot and mentions crawl-delay support, while warning that IP blocking may not be a persistent opt-out mechanism because the bot still needs to read robots.txt. Google documents Google-Extended as a control token for managing whether content crawled by Google can be used for Gemini model training and grounding, while saying it does not affect inclusion or ranking in Google Search.
Those controls are useful, but they also show how fragmented the machine web has become. Every operator defines its own user agents, purposes and trade-offs. Every publisher has to map those signals into policy. Every security team has to decide what to trust. The agentic web is not one channel. It is a growing set of machine actors with different incentives, disclosures and levels of compliance.
The web’s old bargain was traffic for access
For two decades, the open web ran on a simple exchange. Websites allowed search engines to crawl. Search engines used that index to answer user queries. Users clicked results. Publishers, retailers and services turned that traffic into ads, subscriptions, sales or leads. The model was imperfect, often unfair and heavily shaped by platform power, but it had an economic loop: crawler access produced human visits.
AI breaks that loop because it can turn source material into an answer surface outside the source. A model can summarize a news article, compare products, explain a recipe, answer a technical question, generate travel suggestions or synthesize reviews without sending the user through the same list of sites. Even when citations appear, many users do not click them. The answer, not the source page, becomes the destination.
Cloudflare’s “Content Independence Day” announcement on July 1, 2025 framed this as a direct change in default access. Cloudflare said it was changing the default to block AI crawlers unless they pay creators for content, arguing that content is the fuel for AI engines and that creators should be compensated directly. The company also introduced Pay Per Crawl, giving domain owners three options for a crawler: allow, charge or block.
That move was not only a publisher-rights gesture. It was a recognition that the economics of crawling had shifted. If the crawler no longer reliably sends human readers, then access has to be priced, limited or justified in another way. The crawl is no longer automatically a marketing expense. It may be a wholesale extraction event.
The same logic applies beyond media. E-commerce sites once tolerated comparison engines because they could bring shoppers. Travel sites tolerated metasearch because it brought bookings. Software documentation sites tolerated indexing because it brought developers. Forums tolerated search discovery because it brought communities. AI agents can use those same pages as raw material and reduce the need for users to visit.
The issue is not that every AI answer is theft or that every agent is harmful. A personal assistant that helps a user buy a product or read documentation may create real downstream value. A live retrieval bot that cites a source may support discovery. A shopping agent might eventually convert at a higher rate than a casual human browser. But that value is not automatically visible in today’s analytics or revenue systems.
The old bargain failed quietly because it depended on human attention as the returned payment. AI traffic often returns something else: model capability, answer quality, product comparison, user convenience, or agent-mediated purchasing. Those are not useless. They are simply not the same currency. A publisher cannot pay reporters with “model capability.” A retailer cannot pay cloud bills with “agent consideration” unless the agent eventually buys.
This is why pay-per-crawl, licensing, verified agents and machine-readable permissions are becoming strategic infrastructure. They are attempts to define the terms of a new exchange. Some traffic should be free because it benefits the site. Some should be blocked because it is abusive. Some should be paid because it extracts commercial value. Some should be rate-limited because it represents a real user but imposes unusual cost.
The hardest part is that this cannot be solved by one file or one vendor. The web needs economic metadata, identity, authentication, purpose declaration, enforcement, billing and dispute processes for machine visitors. That is a bigger change than adding another bot rule to robots.txt.
The numbers behind the crossover
The current bot-majority narrative rests on several measurements that point in the same direction.
Cloudflare’s June 2026 data, as reported, puts bot and automated HTTP requests to HTML content ahead of human requests, with a roughly 57.5% to 42.5% split. India Today’s report says Cloudflare Radar showed bots around 57% and humans around 43% on April 27, 2026, and that bot traffic had roughly stayed in the 53% to 60% range since then.
HUMAN Security’s 2026 benchmark report adds the growth curve. It says automated traffic across the internet grew 23.51% in 2025, while human traffic grew 3.10%. Monthly AI-driven traffic rose 187% from January to December 2025, and AI agent and agentic browser traffic grew 7,851% year over year.
Cloudflare’s July 2025 crawler analysis showed AI and search crawler traffic up 18% from May 2024 to May 2025 for a fixed customer cohort. Googlebot remained the largest crawler, while GPTBot rose 305% in raw request volume, ChatGPT-User rose 2,825%, and PerplexityBot rose 157,490% from a tiny base.
Imperva’s 2025 Bad Bot Report says automated traffic reached 51% of all web traffic in 2024 and bad bots made up 37%. That finding predates the latest Cloudflare HTML-request split but supports the broader view that humans were already losing majority share in important web-traffic measurements before the agentic spike.
Traffic indicators from major bot and AI reports
| Source and period | Measurement | Reported signal | Strategic meaning |
|---|---|---|---|
| Cloudflare data reported in June 2026 | HTML HTTP requests | About 57.5% bot versus 42.5% human | Bots now dominate a core open-web request category |
| HUMAN Security 2025 data | Automated traffic growth | 23.51% versus 3.10% human growth | Automation is expanding far faster than human browsing |
| HUMAN Security 2025 data | AI agent growth | 7,851% year over year | Agentic browsing moved from niche to visible commercial traffic |
| Imperva 2024 data | Total web traffic | 51% automated | Bot majority was already visible before the latest AI-agent surge |
| Cloudflare May 2024 to May 2025 data | AI and search crawlers | GPTBot up 305%, ChatGPT-User up 2,825% | AI systems are changing crawler composition, not only volume |
These figures are not identical because each company measures a different slice of traffic. The shared message is stronger than any single number: automation has crossed from background noise into the primary traffic-management problem of the open web.
The fact that bot share differs by source is expected. A CDN sees a different flow from a fraud-defense platform. A bot-management vendor classifies differently from a publisher analytics tool. A web request is different from a user session. A crawler hit is different from a checkout event. Measurement uncertainty does not erase the trend. It only makes local verification more important.
For leaders, the question should not be whether the exact global bot share is 51%, 57.5% or some other figure. The operational question is sharper: what share of your traffic is human, what share is machine, what share is useful, what share is extractive, and what share is hostile? If a company cannot answer that, it cannot price access, protect infrastructure, trust analytics or manage AI visibility.
Humans still matter but their clicks no longer define demand
The bot-majority moment invites a dramatic reading: machines have taken over the internet. That is emotionally satisfying and technically incomplete. Humans still create the demand that many agents serve. They ask for product comparisons, research summaries, travel plans, software help, legal background, shopping support, financial explanations, customer service and code snippets. The agent traffic exists because people want less friction.
What has changed is the surface through which demand appears. A human used to express intent through clicks. Search query, result click, pageview, scroll, second page, signup, purchase. Now the user may express intent in one prompt. The AI system does the messy browsing. The site sees machines. The human sees an answer, a list, a recommendation or a completed action.
This is a profound measurement problem. Demand is moving upstream into prompts while cost is moving downstream into site infrastructure. A retailer may pay to serve pages to agents that never show a normal funnel. A publisher may be read by a model but not visited by a person. A software documentation page may solve a developer’s problem inside an assistant, while the documentation site records only bot retrieval.
Marketers should be especially careful. Human demand may be stable or rising even as human clicks fall. A decline in referral traffic does not always mean declining relevance. It may mean the site is being consumed through AI intermediaries. That is good news only if the business has a way to capture value from those intermediaries. Without licensing, attribution, citations, agent conversion or brand preference, hidden demand is not enough.
The same applies to search strategy. Ranking in traditional search results mattered because it produced clicks. Appearing in AI-generated answers matters because it shapes decisions, even when clicks do not arrive. A brand may win the recommendation and lose the visit. A publisher may be cited and still lose ad revenue. A hotel may appear in an agent’s shortlist but never see the browsing path that led there.
This is why answer engines, AI search and agentic commerce are not just traffic channels. They are decision layers. The user may no longer compare ten pages. The model compares them. The user may no longer read five reviews. The model condenses them. The user may no longer visit airline websites. The agent checks and ranks them. The site’s content still matters, but the human interface changes.
The business risk is that companies keep optimizing for old visible clicks while machine intermediaries quietly shape demand. If a site blocks every AI crawler, it may protect content but vanish from AI-mediated discovery. If it allows every crawler, it may subsidize extraction. If it ignores agents, it may misread customers. The next competition is not only for human attention. It is for trusted machine access to human-relevant facts.
The difference between useful bots, abusive bots and AI agents
Calling everything a bot hides the decisions website owners actually face. A bot can be beneficial, neutral, costly or malicious. An AI agent can be a customer delegate, a research assistant, a scraper, a fraud tool, a load problem or a partner. The label alone does not tell a site what to do.
Useful bots include search crawlers, uptime monitors, accessibility tools, archival services, compliance systems, legitimate partner integrations and some AI retrieval agents. They help people find, use or verify content. Blocking them can reduce visibility or break workflows.
Abusive bots include credential-stuffing tools, scalpers, carding bots, inventory hoarders, spam systems, scraping operations that violate terms, DDoS traffic, vulnerability scanners and fake account farms. They impose cost, distort markets and create direct security risk.
AI agents sit across the boundary. A shopping agent comparing products may look like a scraper. A travel agent checking fares may look like inventory abuse. A personal finance assistant logging into an account may look like account takeover. A customer support automation may look like bot traffic but represent a real client. The behavior can be similar while the intent differs.
HUMAN Security’s analysis captures this narrowing line. Its March 2026 blog says AI systems moved from merely reading the web to transacting on it, including product discovery and checkout, and that only half a percentage point separates the rate of benign automation from malicious automation across the interactions it analyzed. That is a hard environment for security teams because old signals such as rapid navigation or automated form filling no longer reliably mean abuse.
The operational answer is not “block bots.” It is classification. Sites need to classify machine traffic by identity, behavior, purpose, authorization, rate, commercial value and risk. A verified search bot may be allowed broadly. A training crawler may be charged. A live answer bot may be allowed only for licensed content. A customer agent may be allowed through a secure delegation flow. A fake browser with stolen credentials should be blocked. A noisy scraper should be throttled or challenged.
This pushes bot management closer to identity and policy. IP address and user-agent string still matter, but they are insufficient. Attackers spoof user agents. Cloud infrastructure rotates IPs. Headless browsers mimic real sessions. Agents may use full browsers. Some legitimate tools run from consumer devices. Some malicious flows include real humans in the loop.
Research is already moving toward behavioral and fingerprint-based detection. A 2026 paper on AI browsing agents found that browser fingerprints alone had limited discriminative power when agents shared browser stacks, while behavioral signals such as typing, scrolling and mouse patterns were more useful for distinguishing AI browsing agents from humans and from one another.
The future of bot policy will look less like a simple allowlist and more like airport security for traffic: identity checks, behavior screening, purpose-specific lanes, risk scoring, rate limits and commercial access rules. A machine visitor will need to prove not only who it is, but why it is there.
AI agents turn one human request into thousands of web actions
The core technical multiplier is simple: agents decompose tasks. A human sees a goal. The agent turns it into steps. Each step can become searches, page loads, API calls, form fills, comparisons and follow-up reads. This is why agentic traffic grows faster than ordinary browsing.
A user planning a holiday might ask for “a quiet beach hotel in Greece for two adults and one child in August, under €250 a night, near restaurants, with refundable booking.” A human might search Google, open booking platforms, read a few reviews and give up after an hour. An agent can query dozens of hotels, compare cancellation policies, check maps, parse reviews, test availability and build a ranked answer.
A user buying a laptop might ask for “the best lightweight machine for development, long battery life, 32GB memory, under €1,800, available in Slovakia.” An agent can inspect retailers, compare specs, scrape reviews, read warranty pages, check stock and monitor price drops. A merchant sees product-page requests. The human sees a shortlist.
A developer might ask an AI coding assistant why an API returns a certain error. The assistant may retrieve official docs, GitHub issues, Stack Overflow threads, package changelogs and vendor examples. The documentation sites see bot traffic. The developer never leaves the IDE.
This is not speculative behavior. HUMAN Security says AI agents can conduct human-like activities from product discovery to checkout, and its research found that 2.3% of agentic activity occurred on checkout pages. That means the machine visitor is no longer confined to public reading. It is approaching the transactional layer of the web.
The difference between a crawler and an agent is agency. A crawler gathers. An agent pursues a goal. It may decide what to click next. It may keep state. It may compare alternatives. It may log in if authorized. It may ask the user for confirmation. It may execute a transaction. A crawler reads the web as a library. An agent uses the web as a workplace.
That shift raises server-load questions. Agents often revisit pages because they need confirmation. They may retry failed actions. They may fan out across many sites. They may run in parallel. They may trigger dynamic content rather than static pages. They may open pages not because a human is interested in reading them, but because a model needs evidence.
It also raises design questions. Many sites are not built for machine delegation. Product data may be inconsistent. Terms may be hidden in modals. Prices may be rendered through scripts. Reviews may be paginated or blocked. Accessibility labels may be poor. The agent then uses more requests to reconstruct what a clean feed or API could have provided. Bad structure becomes traffic waste.
The practical answer is not to make every site easier to scrape. It is to create controlled machine surfaces. Sites need human pages, search surfaces, structured data, partner APIs, licensed AI feeds and agent-specific paths. If agents are coming anyway, a well-governed machine path may be cheaper and safer than forcing them through human pages at scale.
Publishers face the sharpest economic break
News, analysis, reference sites, forums, review sites and knowledge publishers sit at the exposed edge of the bot-majority web. Their product is information. AI systems need information. Their revenue often depends on human visits. AI systems can reduce human visits by producing answers elsewhere.
Cloudflare’s crawl-to-click analysis described a growing imbalance between crawling and referral traffic. It said training drove nearly 80% of AI bot activity, up from 72% a year earlier, while Google referrals to news sites fell in March 2025 compared with January. That combination is dangerous for publishers: more extraction, fewer visits.
A publisher can survive lower traffic if it gets paid for licensing, subscriptions, events, commerce, data products or direct audience relationships. It cannot easily survive a world where its reporting is consumed by machine systems that neither pay nor send readers. The traffic loss is not only a distribution problem. It is a labor-market problem for original information.
AI companies need high-quality content to train models and ground answers. The better the source, the more valuable it is. Yet high-quality sources are often the ones most likely to block crawlers or demand payment. Research on crawler restrictions found growing blocking by content creators and warned that uneven restrictions could shape training datasets. A separate study found reputable news sites were far more likely than misinformation sites to disallow at least one AI crawler, raising concerns about asymmetric access to quality information.
That creates a perverse risk. If serious publishers block and low-quality sites remain open, models may become more dependent on weaker sources. AI answers could then degrade or lean toward material with fewer rights barriers. The web’s information market would reward openness among low-quality actors and restriction among high-quality actors. That is bad for readers, publishers and AI systems.
Licensing can address part of this. Large publishers can negotiate with AI companies. Smaller publishers usually cannot. Pay-per-crawl systems, collective licensing and standardized usage permissions may give smaller sites some leverage. But pricing information access is hard. A local news article, a niche technical guide, a recipe, a forum post and a scientific explanation do not have the same value or rights structure.
The strategic path for publishers is likely mixed. They will need crawler policies by purpose, not blanket policies. They may allow search indexing, permit limited citation retrieval, block training crawlers without license, expose structured summaries for paid partners, and build direct audience products that reduce dependence on platform referrals.
The editorial path is also changing. Content that merely restates commodity facts will be easiest for AI systems to absorb and least likely to earn direct visits. Content with original reporting, proprietary data, expert analysis, local knowledge, strong voice, useful tools and community trust will be harder to replace. AI makes generic publishing less defensible and original publishing more important, even as it makes monetization harder.
E-commerce will meet the agent as both customer and threat
Retailers may experience AI agents more ambivalently than publishers. A publisher often sees AI as a traffic thief. A retailer may see AI as a new buyer interface. If an agent brings a qualified customer ready to purchase, the retailer should want that traffic. If the same behavior is used for scraping, arbitrage, scalping or fraud, the retailer should block it.
HUMAN Security says more than 95% of AI-driven traffic in 2025 was concentrated in retail and e-commerce, streaming and media, and travel and hospitality. That concentration makes sense. These sectors have structured choices, prices, availability, reviews and transactions. They are exactly the kinds of domains where agents can save users time.
For e-commerce, AI agents will change product discovery. A human shopper may search, filter, compare and read reviews. An agent may ask product pages and reviews to justify a recommendation. It may prefer retailers with clean structured data, transparent shipping, clear returns and machine-readable availability. It may avoid sites that are hard to parse or untrustworthy. The new merchandising question is not only “does this page persuade a person?” It is “does this page give a trusted agent enough evidence to recommend us?”
That does not mean retailers should optimize only for machines. Human brand trust still matters. Product images, reviews, policies and service experience still shape decisions. But the agent may pre-filter options before a person sees them. If a brand is invisible to agents, it may never enter the human shortlist.
The threat side is just as real. Retail has long fought bots that buy limited inventory, test stolen cards, scrape prices, create fake accounts, abuse promotions, hoard stock and distort reviews. AI lowers the skill needed to build and coordinate such automation. Imperva’s 2025 report says AI is being used to create more advanced evasive bots targeting APIs, business logic and fraud, while also lowering the barrier for simple bot attacks.
Agentic browsing also blurs signals. A legitimate agent and a malicious bot may both move quickly, compare products, add items to cart and test checkout. A fraud bot may present as a personal shopping assistant. A customer agent may look like scraping. Retailers need to know whether a machine actor has authority from a real customer and whether its actions fit a permitted purpose.
This points toward delegated identity. A customer should be able to authorize an agent to act on their behalf, but the site should see that authorization in a secure way. The agent should not need to fake a human browser. The retailer should not need to guess. Permissions should be narrow: compare prices, check stock, build cart, but do not purchase without confirmation; access order history, but do not change payment method; apply coupon, but do not create bulk accounts.
Until such standards mature, retailers will rely on bot scoring, behavioral analytics, device fingerprinting, rate limits, CAPTCHAs, account risk signals and API controls. That may protect systems, but it can also frustrate legitimate agent use. The winners will build machine access that is safe enough to trust and useful enough to convert.
Travel and hospitality expose the cost of machine comparison
Travel is one of the purest agentic use cases. The user’s goal is complex, prices shift constantly, inventory is fragmented, policies matter, reviews are noisy, and trade-offs are personal. It is painful for humans and natural for agents. That makes travel and hospitality an early stress test for the bot-majority web.
A travel agent can compare flights, hotels, cancellation terms, baggage fees, airport transfers, loyalty programs, weather, local events, visa rules and reviews. Each comparison may require many requests. Human traffic is slow and selective. Agent traffic is broad and repetitive. A person may compare three hotels. An agent may compare three hundred.
For travel businesses, this can be useful if it drives bookings. It can be destructive if it drives only rate scraping, cache misses and infrastructure load. Travel sites already fight fare scraping, seat inventory abuse, loyalty account takeover and payment fraud. Agentic traffic adds legitimate-looking automation to an already contested environment.
The economics are sensitive because travel search can be expensive. Availability checks may hit dynamic systems. Prices can depend on dates, rooms, routes, taxes, loyalty status and inventory. A crawler reading a static article is one thing. An agent repeatedly querying live prices is another. The cost of serving a machine comparison may be far higher than the cost of serving a human pageview.
This will pressure travel companies to expose controlled data products. Airlines, hotels and agencies may prefer authenticated APIs or partner feeds over uncontrolled agent scraping. They may charge for high-volume availability access. They may permit customer-delegated agents through account-based flows. They may block anonymous traffic that looks like fare harvesting.
There is also a consumer-protection angle. If agents become travel intermediaries, they will influence where people stay and what they pay. The agent’s ranking logic matters. Does it favor partners? Does it understand refundable terms? Does it price baggage correctly? Does it distinguish sponsored inventory from organic recommendations? Does it handle accessibility needs? Does it show hidden fees?
The same questions once applied to search engines and booking platforms. Agents make them more personal and less visible. A user may trust a model’s answer without seeing the comparison path. Travel companies will want to be represented accurately. Regulators may eventually ask how agentic recommendations are generated, especially if they affect pricing, availability or consumer choice.
Travel also shows why blocking all agents is not realistic. Many consumers will want agents to plan trips. Brands that refuse all agent access may lose discovery. Brands that allow uncontrolled access may pay heavy infrastructure and data costs. The practical middle is verified, permissioned, rate-limited, auditable access. Travel will not escape the agentic web. It will force the agentic web to grow up faster.
Search crawlers, AI crawlers and user agents now need separate rules
A decade ago, many site owners thought about bots mainly through Googlebot and bad bots. Today they need a taxonomy. Search indexing, AI model training, AI answer retrieval, user-triggered browsing, agentic action, monitoring, research, archiving and abuse are different activities. Treating them the same is a strategic mistake.
OpenAI’s documentation reflects this split by describing crawlers and user agents that support products automatically or through user-triggered actions, and by separating OAI-SearchBot and GPTBot controls. Anthropic lists methods for limiting or blocking crawling, including robots.txt directives for ClaudeBot. Google’s crawler documentation distinguishes Googlebot, GoogleOther, Google-Extended and other crawler tokens, with Google-Extended framed as a control for Gemini model training and grounding rather than a separate HTTP user agent string.
For site owners, the practical policy matrix now looks like this:
Search indexing may be allowed because it brings visibility.
AI model training may be blocked or licensed because it uses content to improve models without necessarily sending traffic.
AI search retrieval may be allowed selectively because it can produce citations or presence in answer engines.
User-triggered agents may be allowed if they represent real user intent and respect rate limits.
Commercial scraping may be charged or blocked.
Security scanning may be allowed only for authorized partners.
Unknown automation may be challenged, throttled or blocked.
Malicious automation should be stopped.
The problem is that current mechanisms do not fully express this matrix. Robots.txt can tell crawlers what not to fetch, but Google’s own documentation warns that robots.txt rules cannot enforce crawler behavior and that some crawlers may not obey them. A 2025 empirical study on scraper compliance found that bots were less likely to comply with stricter robots.txt directives and that some AI search crawlers rarely checked robots.txt at all.
That leaves a gap between polite signaling and hard enforcement. Polite bots may respect stated preferences. Bad actors may not. Large platforms may define their own controls. Smaller AI companies may be inconsistent. Unknown agents may hide behind browsers. The more money at stake, the less a voluntary file can carry the whole burden.
This is why infrastructure providers are adding network-level controls, verified bot programs, managed robots policies, signatures and monetization systems. Cloudflare’s Pay Per Crawl is one example. It turns access control into a commercial decision at the edge, not only a text instruction at the site root.
The likely direction is layered policy. Robots.txt remains a public preference signal. HTTP headers and content signals express usage rights. Verified bot identity proves operator claims. Edge enforcement applies blocks or charges. APIs provide cleaner access for trusted agents. Contracts define paid uses. Logs and audits detect violations.
The era of one crawler rule is over. The web now needs purpose-specific machine governance.
Robots.txt is no longer enough
Robots.txt was created for a more cooperative web. It helps crawlers avoid overloading servers and tells them which URLs they may access. It was never a security boundary. Google’s documentation says directly that robots.txt is mainly used to manage crawler traffic and is not a mechanism to keep a page out of Google; it also says robots.txt instructions cannot enforce crawler behavior because compliance is up to the crawler.
That limitation matters more when AI raises the value of scraping. A crawler that wants training data, pricing data, product data or proprietary text may have incentives to ignore preferences, rotate infrastructure, spoof user agents or use headless browsers. A voluntary rule works best when the crawler benefits from being seen as legitimate. It works poorly against anonymous extraction.
Academic work supports that concern. The 2025 study “Scrapers selectively respect robots.txt directives” used controlled robots.txt experiments and found uneven compliance, including categories of bots that rarely checked robots.txt. Another study on content creators and AI crawlers found strong demand for crawler-blocking tools but practical barriers around technical awareness, agency and limited efficacy against unresponsive crawlers.
The result is a two-tier web. Large publishers and platforms can deploy edge blocking, legal teams, licensing deals and direct contacts at AI companies. Smaller sites may only have robots.txt and hope. That asymmetry is bad for small creators, niche publishers and independent developers, whose work may be valuable but whose enforcement power is weak.
Robots.txt still has value. It is public, simple, widely understood and supported by many legitimate crawlers. It gives compliant operators a clear instruction. It can reduce accidental overload. It creates evidence of stated preferences. It remains part of search and AI crawler management.
But it cannot answer the harder questions: Is the agent who it claims to be? Is the traffic authorized by a user? Is the request for training, search, grounding or transaction? Is the operator willing to pay? Is the crawler respecting rate limits? Is the content being reused within agreed boundaries? Is the bot hiding behind residential proxies? Is a human account being abused by automation?
Those questions require stronger controls. Verified identities can reduce spoofing. Message signatures can authenticate bots. Token-based access can bind permissions to purpose. Structured APIs can reduce waste. Edge systems can enforce rate limits and payment. Contracts can define use. Monitoring can detect anomalies. Legal standards can set penalties for deception.
Robots.txt remains a sign on the door. The agentic web needs locks, badges, tolls, contracts and cameras.
The rise of verified bots and signed agents
Bot identity has always been fragile. A user-agent string is easy to fake. IP ranges change. Reverse DNS checks help for established crawlers but do not solve agentic delegation. When traffic grows and money follows, identity has to become harder to spoof.
Verified bot programs are one answer. They allow known operators to register, authenticate and be treated differently from unknown automation. A verified search bot may bypass some defenses. A verified monitoring tool may get stable access. A verified AI agent may be allowed to retrieve content under defined policies. The value is not only security. It is predictability.
Signed agents take the idea further. A machine visitor can cryptographically prove the operator or delegation chain behind a request. Instead of relying on a string that says “I am ExampleBot,” the request carries a signature that can be checked. That gives site owners a stronger basis for policy decisions. It also gives legitimate AI companies a way to distinguish themselves from scrapers pretending to be them.
The challenge is adoption. Verification only works if enough major agents support it and enough sites enforce it. If every AI company builds its own identity method, site owners face another fragmented system. If identity is optional, malicious actors will ignore it. If identity is too burdensome, useful smaller tools may be excluded.
The best version would separate layers. A standard way to authenticate machine actors. A standard way to declare purpose. A standard way to express site policy. A standard way to authorize user delegation. A standard way to audit compliance. Commercial terms could vary, but the protocol foundations should not be reinvented by every platform.
This matters for consumer agents too. A user may want an AI assistant to book a table, buy a ticket, file a support request or compare insurance. The site needs to know whether the agent is acting for that user, what permissions it has and whether the user approved the final action. Without verified delegation, agents may be forced to behave like brittle browser automation, logging in with user credentials and clicking like a person. That is unsafe.
Verified agents could support narrower access. A user grants permission for an agent to read order status but not change address. The request carries proof of permission. The retailer allows the read action through an agent API. The action is logged. The user can revoke access. That looks more like OAuth for the agentic web than bot scraping.
The web already solved parts of this for apps and APIs. Agents bring the same need to public websites and mixed human-machine experiences. The future of bot management is not only detection. It is authenticated delegation.
Analytics will lie unless traffic is reclassified
The bot-majority web breaks basic analytics assumptions. Pageviews, sessions, bounce rates, conversion rates, referrers, engagement time and funnel paths were built around human browsing. When bots and agents dominate request volume, those metrics can become misleading.
A rise in pageviews may mean stronger audience demand. It may also mean a crawler loop. A drop in engagement time may mean readers dislike the content. It may also mean AI retrieval bots extract the page instantly. A high bounce rate may mean poor landing-page fit. It may also mean an agent only needed one fact. A conversion-rate decline may mean weaker merchandising. It may also mean traffic volume is inflated by non-buying machines.
Ad measurement faces the same problem. If traffic is not human, impressions lose value. If bots load pages but do not see ads as people do, inventory quality falls. If agentic traffic reads content without rendering ads, publisher revenue falls even when content usage rises. If malicious bots generate fake impressions, advertisers pay for nothing.
SEO measurement also changes. Search referrals may decline while brand influence through AI answers grows. A page may become a source for summaries but receive fewer clicks. A product may be recommended by agents but not show a normal acquisition path. Ranking reports may miss AI answer inclusion. Attribution models may undervalue the content that feeds decisions.
Businesses need a new analytics layer with at least five classes of traffic:
Human sessions.
Verified search and discovery bots.
AI training and bulk crawlers.
AI retrieval and answer bots.
User-delegated agents.
Unknown or suspicious automation.
Some companies will need more detail: good bots, partner bots, fraud bots, testing tools, internal automation, synthetic monitors, API clients and malicious scripts. The point is that “traffic” is no longer a single thing.
Once traffic is reclassified, reporting should change. Human engagement should be separated from machine retrieval. Infrastructure cost should be allocated by traffic type. Conversion should be measured for both direct human paths and agent-mediated paths. Content usage should include citations, answer appearances and licensed retrieval, not only pageviews. Security dashboards should show agent activity and bot intent, not only blocked requests.
This is not only a data-cleaning exercise. It changes management decisions. A marketing team may stop celebrating traffic growth if most of it is unpaid AI extraction. A publisher may price licensing based on crawl volume. A retailer may build an agent feed after seeing legitimate agent demand. A security team may prioritize checkout-agent fraud after seeing autonomous activity in payment flows.
The first companies to rebuild analytics around machine traffic will make better decisions than competitors still treating every request as a person. The web’s measurement stack has to catch up with the web’s visitors.
Infrastructure costs will move from audience growth to machine load
For years, more traffic was usually good news. More users meant more ad impressions, more leads, more sales, more signups or more influence. Infrastructure teams still worried about scaling, but business teams saw traffic as demand.
AI traffic makes the relationship less direct. Machine requests can increase cost without increasing revenue. They can hit expensive endpoints, bypass ad monetization, scrape high-value content, trigger dynamic rendering, inflate logs and force security processing. A business can pay more to serve visitors that never become customers or readers.
The cost is not evenly distributed. Static pages behind a CDN are cheaper to serve than dynamic product searches. Cached documentation is cheaper than personalized account pages. A crawl that hits a sitemap politely is cheaper than an agent clicking through faceted navigation. A bot that respects rate limits is cheaper than one that fans out aggressively.
Cloudflare’s AI Insights and crawler analysis are partly responses to this cost problem. Site owners need to see which bots are responsible for traffic, what purpose categories dominate and how those patterns change over time. Without that visibility, the cloud bill becomes a mystery tax.
The economics will push sites toward differentiated access. High-value, high-cost endpoints may require authentication. Bulk access may be charged. Free access may be limited to low-cost pages or clear public-interest purposes. Agents may be routed to structured data endpoints. Unknown bots may get cached summaries or blocked responses. Expensive pages may be protected with stronger challenges.
This will also affect AI companies. If every site starts charging, blocking or throttling indiscriminately, retrieval quality suffers. Agents become slower, less reliable and less useful. AI companies therefore have an incentive to support respectful access: identity, rate limits, licensing, caching, citations and purpose declarations. The cheaper path for the ecosystem is cooperation. The current path is too often extraction followed by blocking.
A healthy machine-access model would reduce waste. Instead of agents scraping human pages repeatedly, sites could expose licensed, structured, cacheable feeds. Instead of every AI company crawling the same pages separately, shared indexes or paid data partnerships could reduce duplication. Instead of unknown crawlers hammering origin servers, verified bots could use predictable rates.
There is a climate and energy dimension as well, though it should be handled carefully. Web requests are not the main cost of AI compared with model training and inference, but unnecessary crawling still consumes compute, bandwidth and storage. Waste at machine scale becomes visible. The bot-majority web turns inefficient information access into an infrastructure cost that someone has to pay.
Security teams now defend against both bots and delegated users
The old security model assumed a meaningful divide between humans and automation. Humans used browsers, behaved slowly and made mistakes. Bots moved fast, repeated patterns and attacked at scale. Defenses used CAPTCHAs, rate limits, IP reputation, device fingerprints, behavioral signals and fraud models to enforce that divide.
Agentic AI weakens it. Agents can use real browsers. They can act slowly if needed. They can solve multi-step tasks. They can operate inside authenticated sessions. They can be directed by legitimate users or attackers. They can read pages, interpret forms, follow instructions and adapt.
OWASP’s Agentic AI threat guidance says agentic AI systems, increasingly enabled by large language models, have expanded scale, capabilities and risks. OWASP’s Top 10 for Agentic Applications 2026 frames autonomous and agentic systems as a distinct security problem because they plan, act and make decisions across workflows.
For web security, this creates several new risk classes.
Prompt injection can manipulate agents that read untrusted pages. A malicious page might instruct an agent to ignore prior rules, leak data, click a dangerous link or perform an unauthorized action. This is not classic SQL injection or cross-site scripting. It is instruction-level compromise inside the agent’s reasoning context.
Tool misuse can occur when an agent has access to browsers, APIs, email, payments or files. If the agent misinterprets a page or obeys malicious content, it may use tools in ways the user did not intend.
Identity abuse becomes harder to spot when an agent uses a real user’s account. Was the login legitimate? Did the user authorize the action? Did malware or a malicious prompt hijack the agent?
Fraud automation becomes more flexible. Attackers can use AI browsers to test stolen cards, manage fake accounts or coordinate scraping with less custom code. HUMAN Security’s blog says its Satori team observed AI agents being used in carding-like attack patterns.
Defenses need to adapt. CAPTCHAs are less attractive if legitimate agents need access and attackers can route around challenges. Blocking headless browsers is insufficient if agents use full browsers. Rate limits help but may block useful agent workflows. Behavioral detection helps but may become more contested as agents imitate people better.
The security path is policy plus provenance. Sites need to know which agents are allowed, what they can do, what user delegated them, and whether their behavior matches the delegated purpose. Sensitive actions should require confirmation. High-risk workflows should use step-up authentication. Agents should receive least-privilege access. Logs should distinguish agent actions from direct human actions.
The next web-security problem is not only stopping bots. It is governing machines that may be acting for real people.
The agentic web creates a new trust layer
Trust on the web used to revolve around domains, certificates, accounts, cookies, passwords, OAuth tokens, payment credentials and platform reputation. Agentic traffic adds a new trust question: can a site trust a machine actor that says it is working for a person, a company or an AI platform?
A trust layer for agents needs several pieces.
First, identity. The agent or operator must be verifiable. A site should not rely on a user-agent string alone. Cryptographic signatures, registered keys, verified bot programs and transparent operator records can reduce impersonation.
Second, delegation. If an agent acts for a user, the site needs proof of user consent and scope. Consent should not mean sharing the user’s password with a bot. It should mean a revocable permission grant with defined actions.
Third, purpose. The request should declare whether it is for search indexing, model training, live answer retrieval, user comparison, transaction, monitoring or another permitted use. Purpose declarations need enforcement and audit, not just words.
Fourth, rate and cost. Even trusted agents need limits. A real user’s shopping agent should not create unlimited load. A paid crawler should stay within terms. A partner feed should have quotas.
Fifth, accountability. When an agent causes harm, the site needs to know who operated it and what agreement governs the incident. Anonymous machine traffic cannot carry high-trust privileges.
This looks like a new institutional layer of the web. It will involve infrastructure companies, browser vendors, AI labs, standards bodies, payment networks, publishers, retailers, regulators and security vendors. It may evolve through messy competition before standards settle.
Cloudflare, HUMAN Security and others are already positioning around this trust problem. Cloudflare frames pay-per-crawl and crawler controls as ways to restore content-owner choice. HUMAN frames agentic visibility and trust as necessary because traditional analytics tools were not built to distinguish humans, bots and AI agents.
The strategic stakes are high. If trust mechanisms work, the web can support useful agents without surrendering to scraping chaos. If they fail, websites may retreat behind logins, apps, paywalls, legal threats and API gates. That would make the open web less open, less searchable and less useful.
The open web needs machines it can recognize, price and constrain. Without that, openness becomes a liability.
Google sits at the center of the crawler dilemma
Google is uniquely hard for publishers and site owners to manage because its crawling supports search visibility, discovery, snippets, shopping, images, news and AI features. Blocking Googlebot can damage search presence. Allowing Googlebot may also support AI-related uses depending on product and policy boundaries.
Google introduced Google-Extended as a publisher control for use in Gemini models and related products, saying it does not affect Google Search inclusion or ranking. That distinction matters because publishers want to separate search indexing from AI training or grounding. They may want Google Search traffic but not unrestricted AI use.
Cloudflare’s own crawler analysis shows Googlebot remained dominant in 2025. It said Googlebot grew 96% from May 2024 to May 2025 and accounted for 50% of the crawler share in its tracked cohort by May 2025. Cloudflare also noted the broader context of AI Overviews and AI Mode rollouts during the period.
For publishers, the strategic bind is clear. Google Search remains a major discovery channel. Google’s AI answer surfaces can reduce clicks. Googlebot is required for search crawling. Google-Extended offers some control over AI-related uses, but many publishers still worry about how search, snippets, AI Overviews, grounding and model development intersect in practice.
This is not only a Google problem. OpenAI, Anthropic, Perplexity, Meta, Microsoft and others all need web access for different AI products. But Google’s position is different because search dependency gives it structural leverage. A publisher can block a small AI crawler and lose little. Blocking Google risks visibility across a massive distribution system.
The policy future may require stronger separation between crawl purposes. Search indexing, ranking, snippets, AI answer grounding, AI training and user-agent retrieval should be distinguishable in practice, not only in documentation. Publishers need confidence that a permission for one use does not quietly become a permission for another.
Regulators may eventually look at this through competition and market-power lenses. If a dominant search crawler can bundle search visibility with AI access, publishers may argue they lack meaningful choice. If controls truly separate use without ranking penalty, the concern weakens. The details matter.
For SEO and GEO strategy, the practical lesson is not to panic-block. It is to map each crawler’s role, understand trade-offs and monitor outcomes. Visibility in AI-mediated search may require access, but access without terms can weaken the economics of publishing.
AI companies are becoming infrastructure consumers of the web
AI companies do not only run models. They consume the web as infrastructure. They crawl it for training data, retrieve it for answers, use it to ground model outputs, and send agents through it to perform tasks. That makes them heavy users of a commons they did not build alone.
The web is valuable to AI because it contains fresh, diverse, human-created and institutionally maintained information. Newsrooms report facts. Developers write documentation. Retailers list products. Governments publish rules. Forums solve problems. Researchers post papers. Reviewers test products. Local businesses update hours. Communities produce context. AI systems draw from this informational substrate.
The cost of maintaining that substrate falls on millions of actors. They pay writers, editors, engineers, moderators, photographers, fact-checkers, support teams, hosting providers and security services. If AI companies use the substrate without returning traffic or payment, they change the incentive to maintain it.
This is the “content fuel” argument behind Cloudflare’s Content Independence Day. Cloudflare stated that content powers AI engines and that creators should be compensated directly. The argument is strongest for original, costly content and weaker for generic public facts. But the broad economic problem is real: AI value chains depend on inputs that may be undercompensated.
AI companies may respond in several ways. They can sign licensing deals. They can honor robots.txt and usage signals. They can provide opt-in controls. They can send referral traffic. They can show citations prominently. They can pay per crawl. They can reduce duplicate crawling. They can use structured feeds. They can share revenue for high-value answer use.
They also face technical incentives to behave well. If high-quality sites block them, their products degrade. If infrastructure providers throttle them, agents slow down. If publishers sue or regulators intervene, uncertainty rises. If users distrust AI answers because sources disappear, product value falls.
At the same time, AI companies face competitive pressure to gather more data and answer more queries. The temptation to crawl aggressively is strong. A model with fresher data may perform better. An agent with broader access may satisfy users. A search answer with more sources may look more complete. The business incentive to consume the web is immediate; the ecosystem incentive to sustain the web is slower.
That gap is where policy and infrastructure will intervene. The bot-majority milestone makes the gap visible. When machine traffic was a minority cost, websites absorbed it. When machine traffic becomes the majority in key categories, the subsidy becomes too large to ignore.
“Dead internet” is the wrong frame but a useful warning
The bot-majority news will inevitably feed “dead internet” language: the idea that the web is mostly machines talking to machines and that authentic human activity has been buried under automation. The phrase captures a real anxiety. It also blurs too many things.
The internet is not dead. People are still watching, buying, arguing, learning, building, publishing, gaming, dating, organizing and working online. Human demand is enormous. The open web still contains human work and human need. What is dying is the assumption that visible traffic equals human presence.
A better frame is intermediated internet. Humans still drive purpose, but intermediaries increasingly perform discovery, retrieval and decision support. Search engines were the first major intermediaries. Social feeds became another. AI assistants and agents are the next. They do not eliminate people. They change where people appear in the system.
The dead-internet frame is useful when it warns about synthetic content, spam, bot engagement, fake accounts and machine-amplified manipulation. Those threats are real. AI-generated pages can flood search. Bot engagement can distort popularity. Fake reviews can mislead shoppers. Automated accounts can influence discourse. Synthetic content can train future models and degrade quality.
But agentic traffic is not identical to fake traffic. A personal AI assistant checking product pages for a real buyer is machine traffic attached to human intent. A search crawler indexing a public page is machine traffic that may support discovery. A monitoring bot checking uptime is useful. Treating all automation as dead matter leads to bad policy.
The important distinction is whether machine traffic is accountable, authorized, useful and fairly priced. A bot that respects rules, carries a real user’s delegation and creates value is different from a scraper that steals content or a fraud bot that tests stolen cards. The web does not need to remove machines. It needs to govern them.
The cultural concern remains. If humans encounter more AI summaries than original sources, more synthetic posts than real voices, more bot reviews than customer experience, trust erodes. People may retreat to closed communities, verified networks, private newsletters, paid apps or real-world relationships. The open web could become a machine-readable backend rather than a human public square.
That outcome is not guaranteed. It depends on incentives. If original sources are paid, cited and discoverable, they can survive. If agents are transparent and accountable, users can benefit. If spam is penalized, quality can remain visible. If platforms reward real human expertise, the web can adapt.
The danger is not that bots exist. The danger is that the web’s economic and trust systems still behave as if they are marginal.
The SEO playbook changes when bots become the reader
Search engine optimization once focused on making pages understandable and attractive to search crawlers and human searchers. Technical SEO ensured crawlability. Content SEO matched search intent. Authority signals helped ranking. User experience supported engagement and conversion. That playbook still matters, but AI traffic changes the audience.
The page now has at least three readers: the human, the search crawler and the AI system. The AI system may be a training crawler, search-answer retriever, agentic browser or assistant tool. It may not care about the same signals a human does. It needs clear facts, structured relationships, source credibility, freshness, entity clarity, policy information and accessible page structure.
This does not mean writing for bots instead of people. Thin, generic, over-structured pages are easy to summarize and easy to replace. The stronger strategy is to make genuinely useful content easier for both humans and machines to understand. Clarity, originality and structure now serve discovery across search, AI answers and agents.
For publishers, this means concise factual passages, strong bylines, dates, source transparency, schema markup, internal context, topic depth and original reporting. For e-commerce, it means clean product data, accurate availability, transparent shipping, returns, reviews, compatibility and comparison-ready specifications. For SaaS companies, it means complete documentation, changelogs, API references, use-case pages and troubleshooting paths. For local businesses, it means consistent hours, services, location data, pricing cues and trust signals.
GEO, or generative engine optimization, adds another layer. AI systems need evidence to include a brand, source or product in an answer. They may rely on structured data, third-party mentions, authoritative citations, reviews, public documentation and entity consistency. A page that ranks well but lacks extractable facts may be less useful to an AI answer engine. A page with clear facts but weak authority may be ignored.
The risk is over-optimization. Some sites will stuff pages with answer snippets, fake FAQ blocks, synthetic comparisons and repetitive entity language. That may produce short-term visibility but long-term distrust. Search engines and AI systems will likely discount low-value AI-written filler. The web already has too much content that says little.
The better SEO response to bot-majority traffic is strategic selectivity. Decide what should be crawlable by search. Decide what should be visible to AI answers. Decide what requires licensing. Decide what should be behind account walls. Build machine-readable content where it supports business goals. Protect content where extraction harms the business.
SEO is no longer only about earning clicks. It is about shaping machine-mediated decisions while preserving enough direct value to sustain the site. Being cited without being visited is not a win unless the business model captures that influence.
Brands will compete for agent recommendations
AI agents may become the new comparison layer for many purchases. A user will not always browse a category page or search results. They may ask an assistant to choose. That makes the agent’s recommendation logic commercially important.
For brands, this creates a new form of shelf space. The old shelf was a store aisle. The digital shelf was search results, marketplace rankings and social feeds. The agentic shelf is a shortlist generated by an AI system. A brand may win or lose before the user sees any website.
Agent recommendations will likely depend on several inputs: product data, reviews, price, availability, delivery speed, return policy, warranty, brand reputation, source credibility, past user preferences, third-party comparisons and the model’s learned associations. Some of those inputs are under a brand’s control. Many are not.
The temptation will be to manipulate agents. Brands may produce pages designed to be scraped favorably, flood the web with synthetic reviews, create comparison sites, sponsor AI-answer placements or seek private deals with assistant platforms. Some of this will resemble old SEO and affiliate marketing. Some will become a new gray market.
Trust will matter. If users believe agents recommend based on hidden commercial arrangements, adoption may suffer. If agents hide sources or fail to distinguish ads, regulators may intervene. If brands cannot verify why they were excluded, disputes will grow. The agentic shelf will need transparency norms, especially in finance, health, travel and expensive consumer goods.
Brands should start with fundamentals. Accurate, current, structured product information. Clear policies. Strong reputation across independent sources. Helpful documentation. Real reviews. Consistent entity data. Public comparisons that are fair and specific. Fast pages. Accessible content. Licensed feeds where appropriate. Agent-friendly APIs where commercially useful.
They should also monitor AI answer surfaces. Which products are recommended? Which sources are cited? Which facts are wrong? Which competitors appear? Which agents can access the site? Which bots are consuming content? Which pages are being hit by AI systems? This is not vanity monitoring. It is market intelligence.
The agent does not replace the brand relationship. It increasingly decides whether the brand gets a chance to form one.
APIs may become the cleaner alternative to scraping
Scraping human pages is often wasteful. Human pages include layout, scripts, ads, trackers, navigation, personalization, images and interaction elements that agents may not need. Agents scrape because the web’s useful data is exposed through pages, not always through clean machine interfaces.
APIs can reduce that waste. A retailer can expose product, price, stock and policy data through a controlled endpoint. A publisher can expose licensed article metadata, summaries or full text under terms. A travel provider can expose availability and fares with quotas. A documentation site can expose versioned content for AI retrieval. A local business platform can expose hours and services.
The benefit is control. APIs can authenticate clients, enforce rate limits, log usage, apply pricing, return structured data and separate public from restricted information. They can reduce origin load and parsing errors. They can support agents without inviting uncontrolled scraping.
The risk is enclosure. If valuable web information moves behind APIs and contracts, the open web becomes thinner. Small developers, researchers, search engines and public-interest tools may face barriers. Large AI companies may afford access while smaller players cannot. The web could become a set of negotiated data corridors.
A balanced approach is needed. Public pages should remain available for human access and basic discovery. Structured data should support search and accessibility. High-volume commercial machine use can move to APIs. Sensitive or costly functions can require authentication. Public-interest crawlers may receive special terms. Abuse should be blocked.
The API path also requires standards for agents. If every site creates a different agent API, integration becomes difficult. Schema.org helped create shared vocabulary for web entities. Something similar may emerge for agent-access policies, content licensing, product comparison, booking actions and delegated permissions.
Until then, agents will keep scraping because scraping is universal. It works on any page, badly or well. APIs are cleaner but require coordination. Scraping is the default because the web is readable. APIs become the answer when readability turns into unpriced machine load.
Legal pressure will grow around consent, copyright and competition
Bot-majority traffic will intensify legal disputes that were already underway. AI companies use web content for training, grounding, search and agentic tasks. Content owners argue that some of that use violates copyright, contract terms, database rights or unfair competition norms. AI companies often argue that training and indexing involve lawful use, public information or transformative processing, depending on jurisdiction and case.
The crawler layer adds evidence. Logs can show who accessed what, how often, under which user agent and after which permissions were declared. Robots.txt files can show stated preferences. Content signals may show licensing intent. Pay-per-crawl systems can show offered terms. Verified bot identity can reduce disputes about attribution. The more structured the machine-access market becomes, the easier it is to ask whether a crawler complied.
Consent will be harder with agents. If a user authorizes an agent to access a site, does that override a site’s anti-bot terms? Can a site block a user’s chosen assistant? Does the agent’s operator get to store retrieved content? Can the agent summarize paywalled material? Can it compare prices behind a login? Can it take actions that bind the user? These questions will not be settled by technical design alone.
Competition law may enter when dominant platforms combine discovery with AI use. Publishers may argue that they cannot refuse certain crawlers without losing essential visibility. Platforms may argue that controls are available and that AI features improve user experience. Regulators will look at market power, tying, transparency and harm.
Consumer law may enter when agents recommend products, display prices, book travel or handle financial tasks. If an agent omits fees, misreads terms or favors sponsored results without disclosure, users may be harmed. If a malicious page manipulates an agent into a purchase, liability becomes complex.
Privacy law may enter when agents browse with personal context. A shopping agent may know a user’s preferences, budget, location, family status and health needs. A travel agent may know passport details. A finance agent may know accounts. Sites and AI operators will need clear data minimization, consent and retention practices.
The machine web is becoming a legal interface, not just a technical phenomenon. The companies that build clean permission, logging and control systems now will be better prepared for regulatory scrutiny later.
Regulators will care because bots now affect markets
Bot traffic is not only an IT issue. It affects advertising markets, media sustainability, consumer choice, retail pricing, cybersecurity, privacy, competition and infrastructure resilience. Once bots become a majority traffic category, regulators have reason to pay attention.
Advertising regulators may examine whether impressions are human, viewable and fairly measured. Securities regulators may care when public-company metrics depend on traffic that includes growing machine shares. Consumer-protection agencies may care when bots create fake scarcity, fake reviews or unfair pricing. Competition authorities may care when dominant AI or search platforms use crawling power to extract content. Data-protection authorities may care when agents process personal data across sites.
Cybersecurity agencies already care about automated traffic because bots are tied to credential attacks, DDoS, scanning and fraud. OWASP’s agentic security work reflects the technical risk of systems that can plan and act across tools. Critical infrastructure operators face an additional concern: AI-assisted scanning and reconnaissance may change background traffic assumptions. A 2026 paper on AI-assisted bot traffic in darknet data found changes in automated reconnaissance patterns relevant to industrial systems, including micro-pacing behaviors that can bypass simple volumetric thresholds.
Policy responses could take several forms. Governments may require clearer bot identification. They may strengthen rules around scraping and terms-of-service circumvention. They may require disclosure when AI agents act commercially. They may regulate synthetic reviews and AI-generated spam. They may support standard-setting for agent identity and delegation. They may apply existing privacy and consumer rules to agent workflows.
The challenge is avoiding overreach. Crawling is essential for search, research, archiving, security and accessibility. Not all automated access is harmful. Strict anti-bot laws can protect incumbents and harm public-interest work. A good policy framework should distinguish malicious automation, commercial extraction, public-interest crawling, user-delegated agents and ordinary search indexing.
The best regulatory contribution may be standardization pressure. If industry cannot agree on agent identity, purpose declarations and permission semantics, governments may push harder. Standards reduce compliance cost and make enforcement more predictable. They also help smaller sites avoid negotiating separately with every AI operator.
The bot-majority moment gives regulators a simple reason to act: machine traffic now shapes human markets. That makes transparency and accountability public issues, not only private engineering choices.
Cyberfraud will exploit the same tools as legitimate agents
Every useful automation pattern can be abused. Agents that compare products can scrape competitors. Agents that fill checkout forms can test stolen cards. Agents that manage accounts can support takeover. Agents that read pages can follow malicious instructions. Agents that book travel can hoard inventory. Agents that generate messages can spam support channels.
HUMAN Security’s report connects AI-driven traffic growth with cyberthreat benchmarks, including scraping attacks, post-login account compromise attempts and carding growth. Its blog says the median percentage of traffic attempting scraping attacks is approaching 20% globally and that carding volume has surged 250% since 2022.
Attackers like agents because agents reduce custom engineering. Instead of building a brittle script for each target, an attacker may prompt or configure an AI browser to adapt. Instead of manually inspecting forms, the agent can parse labels. Instead of hardcoding flows, it can recover from minor layout changes. That flexibility lowers the barrier to abuse.
Defenders should expect hybrid attacks. A human attacker sets goals. AI agents execute reconnaissance, scrape data, test credentials, generate lures, interact with forms and adapt to defenses. Traditional botnets provide scale. Residential proxies provide distribution. Stolen accounts provide legitimacy. AI provides flexibility.
The defensive focus should shift from only “is this automated?” to “is this action authorized and expected?” A fast checkout is not always bad. A slow checkout is not always good. A real browser is not always human. A human account is not always safe. A trusted IP is not always trustworthy.
Signals need to be combined: identity, device, session history, behavior, account reputation, payment risk, content of actions, velocity, tool fingerprints, known agent signatures, user consent, and business context. High-risk actions should require stronger checks. Low-risk reads can be more permissive. Machine traffic should be logged in ways that support investigation.
Security teams also need to defend their own AI agents. An internal support agent browsing customer tickets can be manipulated. A procurement agent can be tricked by malicious supplier pages. A coding agent can ingest poisoned documentation. OWASP’s LLM and agentic risk frameworks are relevant because the agent becomes both a browser and an attack surface.
The web is moving from bot detection to agent risk management. The organizations that treat agents as identities with privileges, not just traffic patterns, will be safer.
Content quality may suffer if high-trust sources close off
The AI crawler fight creates a quality paradox. AI systems need reliable web content. Reliable content is expensive to produce. Expensive content is more likely to demand payment or block crawlers. Low-quality content is often open, cheap and abundant. If high-trust sources close while low-trust sources remain accessible, models and answer engines may draw from worse material.
Academic work has begun to examine this risk. Research on AI training datasets and crawler restrictions found that content creators increasingly block AI crawlers and that restrictions vary by popularity and content type. Research on robots.txt gatekeeping found reputable news websites were far more likely than misinformation sites to disallow AI crawlers, with the gap widening over time.
For AI companies, this is not a minor sourcing issue. If models lose access to high-quality sources, answers can become stale, shallow or biased toward material that remains open. If AI answers degrade, users suffer. If users suffer, AI products lose trust. If AI products lose trust, the economic value of aggressive crawling falls.
For publishers, blocking everything has a cost too. It may reduce AI visibility and weaken influence in answer engines. A medical publisher, technical documentation site or news outlet may want its facts represented accurately in AI systems. Full exclusion can protect rights while also reducing public presence. The strategic question is not access or no access. It is access under which terms.
This is where licensing and content signals could improve the market. High-quality sources could allow retrieval for citation and freshness while charging for training or bulk use. AI systems could show citations more prominently and share value. Publishers could provide authoritative feeds. Standards could distinguish fact extraction, quotation, summarization, training and transactional use.
The quality problem also affects SEO spam. If AI-generated low-cost pages flood the open web and are easy for crawlers to access, models may ingest synthetic material that repeats errors or generic phrasing. Model-generated content can then train future models, weakening information diversity. This “model collapse” concern is broader than web crawling, but bot-majority traffic increases exposure.
The web needs incentives for original work to remain visible. If the agentic web rewards only what is free to scrape, it will gradually scrape a cheaper and less trustworthy web.
Small websites face the hardest trade-offs
Large publishers and platforms can negotiate. They have traffic, brand power, legal teams and technical staff. Small websites often have none of that. A niche blog, local news outlet, independent research site, small e-commerce shop or community forum may be valuable to users and AI systems but poorly equipped to manage bot policy.
The small-site problem has three parts.
First, visibility. Many small sites depend on search and referrals. Blocking major crawlers can erase discovery. Allowing every crawler can expose content to extraction. The site owner may not know which user agent does what or what trade-off each block creates.
Second, cost. A small site can be hurt by bot spikes that a large platform would barely notice. AI crawlers can hit expensive pages, overload cheap hosting or inflate bandwidth bills. Security tools may cost money. Bot-management services may be too complex or expensive.
Third, bargaining power. Large publishers may license content. Small creators are usually take-it-or-leave-it. Their content may be scraped without meaningful recourse. Even if a pay-per-crawl tool exists, AI companies may prioritize deals with larger sources.
Research on content creators found that artists and creators want protective tools but face technical awareness and deployment barriers. That applies across the small web. The people producing valuable content are not always server administrators. A writer should not need to become a bot-policy engineer to protect their work.
Infrastructure providers can help by making controls simpler. One-click AI crawler policies, managed robots.txt, verified bot dashboards, default rate limits, traffic explanations and plain-language trade-off warnings can reduce the burden. But defaults matter. If the default is open, small sites may be extracted. If the default is closed, they may disappear from useful AI discovery.
A reasonable default might allow search indexing, limit known AI training crawlers unless opted in, permit verified user-triggered retrieval at safe rates, block obvious abuse and provide clear logs. Different sites will want different settings, but the baseline should not assume that every machine visitor deserves full access.
Small sites also need collective options. Cooperatives, publisher alliances, rights organizations, marketplaces and standard licenses could help aggregate bargaining power. Without aggregation, the agentic web may concentrate value among platforms that can negotiate and AI companies that can crawl.
The open web’s diversity depends on making bot governance usable for people who do not have security teams.
AI answer engines will change referral economics
Search referrals were already weakening for many publishers before the bot-majority headline. AI answer engines accelerate the change by satisfying user intent directly. The user asks; the answer appears; the source may be cited but not visited.
Cloudflare’s crawl-to-click analysis said Google referrals to news sites in its dataset declined in early 2025, with March down around 9% compared with January and April down 15%. It connected the shift with broader search changes including AI Overviews and AI Mode, while noting that the timing does not prove a single cause.
This shift changes the value of ranking. In classic search, a high ranking produced traffic. In AI search, a source may inform the answer without receiving a click. The value may come through brand mention, citation, authority, licensing, conversion later in the journey or no value at all. Publishers need to measure all of that.
Some content types are more vulnerable. Direct answers, definitions, simple how-to queries, commodity news summaries and basic comparisons are easy for AI systems to satisfy without a click. Original reporting, interactive tools, proprietary datasets, deep expert analysis, community discussion, multimedia experiences and high-stakes decision support may retain more direct value.
E-commerce referral economics will also shift. A user may ask an AI assistant which product to buy. The agent may send the user directly to checkout or to a marketplace, bypassing many comparison pages. Affiliate sites may lose traffic if agents summarize their work. Review sites may need licensing or direct trust relationships with AI platforms.
The new referral economy will include citations, agent actions, answer inclusion, direct transactions and data licensing. A publisher may receive fewer visits but more paid retrieval. A retailer may receive fewer browsing sessions but higher-intent agent carts. A SaaS company may receive fewer documentation visits but more developer adoption through AI coding assistants. Each business needs its own measurement.
The danger is that platforms keep the value while sources carry the cost. If AI answer engines reduce clicks but do not compensate sources, the source base weakens. If source quality weakens, answer engines degrade. The market may eventually correct through licensing, blocking or user distrust, but damage could occur first.
Referral traffic is no longer the only repayment for being crawled. Unless another repayment appears, the old web economy will keep shrinking.
Machine-readable content becomes a business asset
As agents become common, machine-readable content moves from technical hygiene to business strategy. Structured data, clean HTML, accessible markup, canonical URLs, clear timestamps, author information, product schema, FAQ schema, review data, API documentation and feed quality all influence how machines understand a site.
For years, structured data helped search engines display rich results. Now it helps AI systems extract facts and agents make decisions. A product page with clear price, availability, model number, warranty, shipping and return policy is easier for agents to compare. A news article with clear date, author, update history and source links is easier to trust. A documentation page with version labels and examples is easier for coding assistants to use.
This does not mean every business should expose everything freely. Machine readability and access control are separate. A site can make permitted content easy to parse while blocking or charging for restricted uses. In fact, controlled machine-readable feeds may reduce unauthorized scraping by offering a cleaner licensed path.
The risk is that machine-readable content can be copied. A clear product database is useful to customers and competitors. A well-structured article is useful to readers and models. A public API is useful to partners and attackers. That is why machine readability must be paired with policy, rate limits and licensing.
Content teams should work with technical teams. Editorial structure is no longer only about reader experience. Product data quality is no longer only about site search. Documentation clarity is no longer only about developers. Every public fact is now a possible input into machine-mediated decisions.
This creates a new kind of content audit. Which facts should be public? Which should be structured? Which should be licensed? Which should be hidden behind accounts? Which are outdated and likely to mislead AI systems? Which pages generate machine traffic without business value? Which pages are cited by AI answers? Which pages are scraped heavily? Which pages should have explicit usage terms?
Businesses that treat content as structured infrastructure will be easier for trustworthy agents to use. Businesses that leave content messy may still be scraped, but less accurately and with more server load. In the agentic web, clarity is a competitive advantage only when paired with control.
The bot-majority web will change web design
Human-centered design is not going away. People still need usable pages. But web design now has to account for machine visitors. That affects navigation, rendering, performance, structured data, forms, rate limiting and interaction patterns.
Many modern sites are difficult for agents and crawlers because content depends heavily on client-side rendering, scripts, infinite scroll, modal dialogs, cookie banners, personalization and anti-bot challenges. Humans may tolerate some of this. Machines may misread it, retry it or generate excessive requests. The result is worse accessibility, worse SEO, worse agent performance and more load.
Designing for both humans and machines means separating content from presentation more cleanly. Server-rendered or easily discoverable core content. Structured metadata. Clear canonical pages. Accessible forms. Stable URLs. Proper status codes. Logical pagination. Clear policy pages. Machine-readable terms where appropriate.
Forms need special attention. Agents may fill forms for users. Sites need to know when that is allowed. Sensitive forms should require explicit user confirmation or authenticated delegation. Low-risk forms may allow verified agents. Hidden traps and CAPTCHAs may block both abuse and useful accessibility tools. Better identity may reduce reliance on hostile design patterns.
Performance also changes. Bots can stress pages differently than humans. They may hit many URLs quickly, ignore assets, request only HTML, or trigger dynamic search. Sites should cache intelligently, protect expensive endpoints and provide cheaper machine paths. A page that is fast for a human may still be expensive under agent fan-out.
Content licensing may become visible in design. Pages could expose usage signals in headers or metadata. AI systems could read permissions without scraping legal pages. Sites could show different content tiers for humans, search, AI retrieval and licensed training. This will be controversial, but some differentiation is likely.
The design principle is simple: make legitimate machine use clear and cheap, and make unauthorized machine use identifiable and costly. That is different from trying to make every bot look human or every human prove they are not a bot. The web needs less deception, not more.
Server logs become strategic evidence
In the bot-majority web, logs are no longer just debugging records. They are evidence of content use, security events, commercial demand, crawler compliance and infrastructure cost. A company that cannot analyze logs cannot govern machine traffic.
Useful log analysis should answer concrete questions. Which bots visit the site? Which claim known user agents? Which verify through reverse DNS or signatures? Which pages do they hit? How often? At what times? From which networks? Do they respect robots.txt? Do they hit disallowed paths? Do they trigger dynamic endpoints? Do they reach checkout, login or account pages? Do they send referrals? Do they convert? Do they correlate with fraud?
Logs also support negotiations. A publisher asking for compensation can show crawl volume. A retailer can show agent-driven product demand. A site can prove a crawler ignored disallow rules. A security team can demonstrate credential attack patterns. A legal team can connect access to terms.
The problem is that many organizations do not keep logs in a usable form. Privacy rules limit retention. Cloud costs discourage storage. Analytics tools sample or aggregate data. CDNs may hold edge logs separately from application logs. Bot classifications may not be joined with business outcomes. Marketing teams may never see raw traffic composition.
A better stack links edge logs, bot classification, application events, conversion data, security events and content metadata. It should allow teams to see machine traffic by purpose and business impact. A content owner should know not only that GPTBot visited, but which topics it consumed and whether the site permits that use. A retailer should know whether agent traffic to product pages later produces purchases. A security team should know whether AI browser patterns correlate with card testing.
Privacy must be handled carefully. Logs can contain personal data. Agent delegation can expose sensitive flows. Retention should be proportionate. Access should be controlled. But avoiding log analysis entirely leaves companies blind.
The next licensing deal, bot dispute or fraud investigation may be won by the organization with the cleanest traffic evidence.
AI traffic will force a new pricing model for access
The web historically priced access indirectly. Users paid with attention, data, subscriptions, purchases or exposure to ads. Search engines paid with referrals. Crawlers were tolerated because they supported discovery. AI traffic strains that model because it can consume content without delivering attention or immediate revenue.
Pay-per-crawl is one possible pricing model. Cloudflare’s version lets publishers set a per-request price and choose allow, charge or block. This is simple enough to understand, but the right price is hard. A request to a homepage, a deeply reported article, a product page and a live pricing endpoint do not have equal value. A training request and a user-triggered answer request may not deserve the same fee.
Licensing is another model. AI companies pay for access to a corpus or feed. This can support high-quality sources and reduce scraping disputes. It may favor large publishers and leave smaller sites out unless aggregated.
Revenue sharing is a third model. If an AI answer uses a source and generates subscription revenue, ad revenue or transaction revenue, the source receives a share. This is hard to track and negotiate but aligns incentives better than raw crawl pricing.
Referral guarantees are a fourth model. AI systems could send meaningful traffic back through citations, source links, recommended reading or “continue on source” flows. This preserves some of the old bargain but may conflict with user preference for direct answers.
Transaction fees are a fifth model. If an agent buys a product, books a hotel or subscribes to a service, the destination business may accept the agent traffic as customer acquisition and pay a commission only on conversion.
Different content and commerce types will use different models. News may need licensing. Retail may prefer conversion. Documentation may allow retrieval because it supports product adoption. Government information should remain open. Academic content may require public-interest access. High-cost dynamic data may need paid APIs.
The central change is that machine access becomes a priced resource, not an assumed side effect of publishing. The market will be messy because value is uneven and measurement is immature. But the direction is clear: when bots become the majority, free unlimited machine access becomes harder to justify.
Website owners need a traffic-rights strategy
Every serious website now needs a traffic-rights strategy. This is not only a security policy or an SEO setting. It is a business decision about who may access which content, for what purpose, under what limits and with what payment or attribution.
The strategy should begin with classification. Which parts of the site are public marketing material? Which are original content? Which are product data? Which are user data? Which are expensive dynamic endpoints? Which are contractual assets? Which are sensitive? Which are meant for search discovery? Which are meant for partners?
Next comes bot mapping. Identify major crawlers, AI bots, agents, scrapers, monitors and unknown automation. Use official documentation for known operators. Verify identity where possible. Compare declared purpose with observed behavior. Watch for spoofing.
Then define policy by purpose. Allow search crawlers needed for discovery. Decide whether to allow AI training. Decide whether to allow AI answer retrieval. Decide whether user-triggered agents can access content. Decide whether commercial scrapers require payment. Decide rate limits. Decide what happens to unknown bots.
Then build enforcement. Robots.txt is the public signal. Edge rules are enforcement. Bot management handles detection. APIs provide clean access. Authentication protects sensitive flows. Logs support audit. Contracts define paid use. Legal terms clarify rights.
Then measure outcomes. Did human traffic change? Did bot traffic fall or shift? Did server cost improve? Did AI answer visibility decline? Did licensing revenue appear? Did abuse move to other paths? Did legitimate customers complain? Policy should be adjusted based on evidence.
This work cannot sit only with IT. SEO teams understand discovery trade-offs. Legal teams understand rights. Security teams understand abuse. Product teams understand user agents and customer value. Finance teams understand cost. Editorial or merchandising teams understand content value. Leadership must decide risk appetite.
A blanket block may feel strong but reduce visibility. Blanket openness may feel growth-friendly but subsidize extraction. The right answer depends on the business model. A website without a traffic-rights strategy is letting machine visitors define the economics by default.
AI platforms need to earn permission, not assume it
AI companies often speak about the web as public information. That is partly true. Much of the web is publicly accessible. Public accessibility is not the same as unlimited commercial permission. A shop window is visible from the street; that does not grant every use of what is displayed inside.
AI platforms will need to earn durable access by offering value to content owners and site operators. That value can be payment, referral traffic, citations, reduced crawling, useful analytics, user conversion, technical compliance, or strong respect for permissions. The exact mix will vary.
Earning permission also means reducing duplication. If multiple bots from the same company crawl the same content for training, search, grounding and agents, site owners pay the cost repeatedly. OpenAI’s documentation notes that if a site allows both OAI-SearchBot and GPTBot, results from one crawl may be used for both use cases to avoid duplicative crawling. That kind of efficiency matters, but it must be balanced with clear purpose control.
AI companies should also provide webmaster tools comparable to search consoles. Site owners need to see how often AI systems access content, which pages are used, what purpose applies, whether the content appears in answers, how to fix errors, how to license access and how to appeal misuse. Without that transparency, trust will erode.
Citations should become more useful. A tiny footnote that users rarely click may not sustain publishing. AI answers can surface source brands, link to original context, show dates and distinguish direct quotes from synthesis. They can avoid summarizing paywalled work beyond fair limits. They can direct users to source pages when the source experience matters.
Agents should respect user and site boundaries. A user’s agent should not bypass terms, overload services, ignore robots preferences or scrape restricted material under the excuse of user intent. Site owners should not use anti-bot rules to unfairly block user choice. Both sides need norms.
Permission is becoming a competitive advantage for AI platforms. The companies with trusted access to high-quality sources will build better products than companies forced to scrape the leftovers.
The open web could become more closed if the economics fail
A possible outcome of bot-majority traffic is web closure. Publishers put more content behind paywalls. Forums block crawlers. Retailers hide prices behind apps. Documentation requires login. APIs require contracts. Small sites install aggressive bot blockers. Search and AI systems lose access. Users encounter more walls.
Some closure is rational. If open access means unpriced extraction, sites will protect themselves. If bots impose real cost, blocks will rise. If AI systems reduce traffic, publishers will seek direct revenue. If scraping fuels competitors, businesses will hide data.
But too much closure harms the web. Search quality declines. New entrants struggle. Researchers lose visibility. Public knowledge fragments. Small sites become harder to find. AI systems depend on licensed large sources and open low-quality material. Users face more login friction. The web becomes less linkable and less public.
The better path is controlled openness. Public content remains available for humans and basic discovery. Machine access is governed by purpose, identity, rate and terms. High-volume commercial use pays. Public-interest access is protected. User-delegated agents have secure pathways. Abuse is blocked. Data rights are clearer.
This is hard because the web’s openness was originally simple: publish a URL and anyone can fetch it. Controlled openness adds complexity. But the alternative may be worse. The bot-majority web makes naive openness too expensive for many sites.
Infrastructure providers will shape the outcome. If they make controls too blunt, closure will rise. If they make controls nuanced and affordable, sites may stay open under conditions. AI companies will shape it too. Respectful access reduces defensive closure. Aggressive scraping accelerates it.
Governments and standards bodies may need to protect the public-interest layer. Libraries, archives, researchers, accessibility tools, emergency information and government data should not be trapped by commercial bot wars. A more governed machine web should not destroy legitimate non-commercial crawling.
The open web’s next form will not be unrestricted openness. It will be negotiated openness.
Two timelines are colliding
Two technology timelines are colliding. The first is the web’s slow adaptation to bots, which has been underway for decades. The second is AI’s rapid move from chatbots to agents, which is compressing change into months.
Search crawlers trained website owners to tolerate machine readers. SEO trained them to structure pages for machines. Fraud bots trained them to detect automation. APIs trained them to expose data to software. Cloud platforms trained them to scale traffic. The web already had many pieces needed for machine access.
AI agents combine those pieces and accelerate demand. A chatbot with browsing is not just a search user. A coding assistant with retrieval is not just a developer. A shopping agent is not just a comparison site. A browser agent is not just a bot. It can interpret, decide and act. That creates new traffic faster than institutions can adjust.
Prince’s shifted expectation captures the acceleration. In March 2026, reporting focused on his prediction that bot traffic could surpass human traffic by 2027. By early June 2026, he said the crossover had already happened. The timeline moved from future forecast to present condition in less than a quarter.
Businesses usually adapt to platform shifts through planning cycles: annual budgets, roadmap quarters, vendor evaluations, legal review, procurement. Agentic traffic does not wait for that cadence. Bot policy, analytics, security and content licensing have to move faster.
The first phase is awareness. Many executives still see bots as a cybersecurity issue, not a business-model issue. The second phase is measurement. Companies need to know their own traffic mix. The third phase is policy. They decide what to allow, block, charge or route. The fourth phase is product adaptation. They build agent-friendly access where it makes sense. The fifth phase is market redesign. New licensing and referral models settle.
Most organizations are between phase one and two. AI traffic has already reached phase four behavior. The machines are acting before the institutions have rules.
Enterprise websites need agent access controls before customers demand them
Enterprise websites will soon face a customer-experience question: why can’t my AI assistant use your site? Consumers and business users will expect agents to compare plans, manage subscriptions, fill forms, retrieve invoices, schedule appointments, check delivery, update records and troubleshoot products. Blocking all automation may protect systems but frustrate users.
Banks, insurers, healthcare providers, telecoms, utilities, SaaS platforms and government services will face this pressure. Some tasks are low risk. Others are sensitive. The answer cannot be to let any agent log in with the user’s password and click around. That is insecure and hard to audit.
Enterprises need controlled delegation. A user should be able to grant an agent access to specific tasks. The organization should see that grant. The agent should not receive more permission than needed. High-risk actions should require confirmation. Logs should show that an agent acted. Revocation should be simple.
This resembles API authorization but must work across web experiences. Many consumer services have APIs for partners, not for personal agents. Many legacy systems cannot support fine-grained delegated access. Many compliance programs do not yet classify AI agents as a user type.
Security teams may resist agent access because it increases risk. Product teams may want it because users will demand it. Legal teams may worry about liability. Customer support teams may benefit if agents reduce calls. The tension will be real.
The companies that solve it well can create better customer experiences. Imagine a verified agent that can gather insurance quotes without exposing unnecessary personal data, or a telecom agent that can compare plans and request a change with user approval, or a SaaS agent that can retrieve invoices for accounting. These are useful workflows if governed properly.
The companies that solve it poorly will see shadow automation. Users will paste credentials into agents, use browser extensions, run scripts or authorize third-party tools with unclear security. Attackers will exploit the confusion. If enterprises do not provide safe agent paths, unsafe agent paths will appear.
The web’s future will be human intent and machine execution
The deepest change is not that bots outnumber humans in a traffic category. It is that the web is moving toward a split between intent and execution. Humans express goals. Machines execute web actions. That pattern will expand because it saves time.
People do not want to compare every insurance policy manually. They do not want to read every product review. They do not want to search ten help pages. They do not want to fill repetitive forms. They do not want to monitor prices. They do not want to reconcile invoices. They do not want to check every flight combination. Agents promise to absorb that labor.
This does not eliminate human judgment. It changes where judgment happens. The user chooses the agent, sets preferences, reviews options and approves actions. The agent handles retrieval and comparison. The site provides data and transaction paths. The model mediates. Trust moves from page design to agent behavior.
That future can be useful. It can reduce friction, improve accessibility and help people make better decisions. It can also concentrate power in AI intermediaries, weaken direct relationships, hide trade-offs, increase surveillance and make the open web harder to fund.
The outcome depends on choices made now. Will agents identify themselves? Will they respect permissions? Will users control delegation? Will sources be paid? Will citations matter? Will small sites have tools? Will analytics distinguish humans and machines? Will security teams govern agent actions? Will regulators protect competition and consumers without killing useful automation?
The bot-majority milestone is a measurement event, but it points to a design question. Should the web remain a public space for people, or become an unpriced backend for AI systems? The answer should be neither. The web should remain human in purpose while becoming more explicit about machine access.
The internet has survived platform shifts before: portals, search, social, mobile, apps, streaming, cloud. AI agents are another shift, but they strike closer to the web’s economic foundation because they consume pages without necessarily delivering visits. That is why this moment feels different.
Bots passing humans in HTML traffic is not the end of the web. It is the end of a comforting assumption. The visitor is no longer probably a person. The pageview is no longer probably attention. The crawl is no longer probably repaid by a click. Every serious web business now has to decide what machine traffic is worth.
A practical framework for publishers and site owners
The immediate response should be disciplined, not theatrical. Panic-blocking all AI traffic may damage visibility. Ignoring the shift may subsidize extraction and inflate costs. A practical framework starts with evidence.
First, audit traffic. Separate human sessions from known bots, verified bots, AI crawlers, user-triggered agents, suspicious automation and unknown traffic. Use CDN logs, server logs, bot-management tools and analytics data. Do not rely only on front-end analytics because many bots do not run scripts and many agents behave unlike normal users.
Second, map content value. Identify pages that drive revenue, contain original work, expose dynamic data, cost heavily to serve, or support customer acquisition. Not all pages deserve the same policy. A public press release, a paywalled investigation, a product listing and an account page should not be treated alike.
Third, define allowed uses. Search indexing may be allowed. AI training may require permission. Live AI answer retrieval may be allowed with citation. Commercial scraping may require payment. User agents may be allowed only if verified and rate-limited. Unknown bots may be challenged.
Fourth, update robots.txt and crawler controls. Use official documentation for OpenAI, Anthropic, Google and other major operators. Keep records of policy changes and dates. Robots.txt is not enforcement, but it is a public signal and a baseline for compliant crawlers.
Fifth, enforce at the edge. Use rate limits, bot scores, verified bot allowlists, challenges, API gates and blocking rules. Protect expensive endpoints. Watch for user-agent spoofing. Avoid security theater that hurts humans more than bots.
Sixth, build machine access where it creates value. A retailer may create a product feed for verified agents. A publisher may create licensed access for AI retrieval. A SaaS company may expose documentation APIs. A travel provider may route availability queries through paid endpoints.
Seventh, monitor AI visibility. Track citations, answer appearances, brand mentions, referral changes and agent conversions. Human traffic is no longer the only signal of influence.
Eighth, revisit policy quarterly. AI operators change bots, documentation and behavior. New agents appear. Business goals shift. A static policy will age quickly.
Site-owner decision matrix for AI and bot traffic
| Traffic type | Default posture | Main risk | Main opportunity |
|---|---|---|---|
| Traditional search crawlers | Allow with monitoring | Overcrawl or search dependency | Discovery and human referrals |
| AI training crawlers | License, limit or block | Unpaid content extraction | Paid data use or model presence |
| AI answer retrieval bots | Allow selectively | Answers replace visits | Citations, authority and influence |
| User-delegated agents | Permit through verified paths | Account abuse or load spikes | Higher-intent tasks and conversions |
| Unknown automation | Challenge or throttle | Scraping, fraud, infrastructure cost | Possible emerging tools worth reviewing |
| Malicious bots | Block and investigate | Credential attacks, carding, spam | Threat intelligence and defense learning |
This matrix is a starting point, not a universal policy. The right setting depends on business model, content type, legal position, technical capacity and risk tolerance.
The teams that handle this well will not simply reduce bot traffic. They will sort machine traffic into business categories. Some will be blocked. Some will be charged. Some will be welcomed. Some will be redesigned into better channels. The goal is not fewer bots. The goal is fewer ungoverned machines.
A practical framework for AI companies
AI companies also need a framework because aggressive crawling is becoming a reputational and operational liability. The companies that earn access to high-quality sources will have a durable advantage over those that trigger blocks.
The first principle is identity. Every crawler, retrieval bot and agent should identify itself clearly and verifiably. User-agent strings should be documented. IP ranges or signature methods should be current. Operators should avoid mixing purposes under ambiguous identities.
The second principle is purpose separation. Training, search indexing, grounding, user-triggered retrieval and agentic action are different. Site owners should be able to permit one without permitting all. OpenAI’s separation between GPTBot and OAI-SearchBot is an example of this direction. Google-Extended is another attempt to separate AI-related use from search inclusion.
The third principle is rate discipline. AI systems should not hit sites harder than necessary. They should use sitemaps, caching, conditional requests, backoff, crawl-delay where honored, and deduplication. They should avoid expensive dynamic paths unless authorized.
The fourth principle is value return. Sources need something back: payment, referrals, citations, traffic, analytics, brand exposure, licensing opportunities or conversion. The old “we crawled because it was public” argument will not sustain trusted access.
The fifth principle is webmaster transparency. AI companies should give site owners dashboards showing crawl volume, use categories, answer appearances, error reports and policy controls. Search engines built webmaster tools because the web needed a relationship layer. AI systems now need the same.
The sixth principle is user safety. Agents should not bypass site controls, hide identity, store sensitive data unnecessarily or execute high-risk actions without confirmation. They should be designed against prompt injection and malicious web content.
The seventh principle is small-site fairness. Licensing should not only serve the largest publishers. AI companies need ways to respect and compensate smaller sources, or at least avoid imposing disproportionate cost on them.
These principles are not charity. They protect AI product quality. If AI companies exhaust the goodwill of the web, they will face more blocks, worse data, more lawsuits and more regulation. The cheapest crawl today can become the most expensive access loss tomorrow.
The newsroom meaning of Prince’s statement
Prince’s statement landed because it compressed many complicated shifts into a single sentence. Bots passed humans. It sounded like science fiction and infrastructure accounting at the same time. The best news judgment is to treat it as a threshold, not a prophecy.
The threshold is real enough. Multiple reports show automated traffic overtaking or outgrowing human traffic in major web categories. Cloudflare’s latest HTML-request split is striking. HUMAN Security’s growth numbers show acceleration. Imperva’s 2024 data shows the majority had already arrived in broader automated web-traffic measurement.
The prophecy would be the claim that humans are now irrelevant online. That is false. Human intent remains central. The traffic is machine-heavy because people and companies are using machines to pursue human and commercial goals. The web’s problem is not lack of humans. It is the displacement of human-visible actions by machine-visible actions.
For newsrooms covering this shift, precision matters. “Bots are now more than half of Cloudflare-measured HTML HTTP requests” is less viral than “bots overtook humans,” but it is more accurate. “AI agents are growing fast enough to change web economics” is more useful than “the internet is dead.” Readers need both the headline and the caveat.
Newsrooms also have skin in the game. They are among the businesses most exposed to AI extraction and referral decline. Their coverage should be evidence-led, but not falsely detached. The sustainability of reporting is part of the story. If AI systems use journalism without supporting journalism, the public information supply weakens.
That does not mean newsrooms should reject AI. They can use AI for research support, transcription, translation, data analysis and workflow assistance. They can also license archives, build direct products and appear in AI answer engines. The question is whether the exchange is fair.
Prince’s statement is news because it says the web’s default user has changed. It is analysis because the consequences depend on business models, policy and trust.
The business impact for Cloudflare and its rivals
Cloudflare is not a neutral observer in the commercial sense. It sells infrastructure, security, bot management, CDN, developer and AI-related services. A bot-heavy web increases demand for what Cloudflare provides: traffic visibility, bot classification, edge enforcement, DDoS protection, crawler controls, pay-per-crawl systems and developer infrastructure.
That does not make its data irrelevant. It means readers should understand incentives. Cloudflare benefits when site owners see machine traffic as a strategic problem. HUMAN Security benefits when companies need agentic visibility and fraud defense. Imperva and Thales benefit when bad bots become a board-level concern. AI companies benefit when web access remains easy. Publishers benefit when content rights are recognized.
The bot-majority web creates a growing market for “machine traffic governance.” This includes bot detection, verified identity, API access, traffic pricing, content licensing, data clean rooms, synthetic monitoring, fraud prevention, agent security, analytics and compliance. The market will not be small because it touches almost every web business.
Cloudflare’s Pay Per Crawl is especially interesting because it turns a network provider into a pricing intermediary between content owners and AI crawlers. If such systems scale, infrastructure companies could influence the economics of AI data access. That raises power questions. Who sets prices? Who verifies crawlers? Who decides defaults? How are disputes handled? How do small sites participate?
Rivals will respond. Other CDN and security companies may offer crawler monetization, AI bot dashboards, agent verification or content-rights tools. Publishers may use multiple systems. AI companies may prefer direct deals. Standards may reduce vendor lock-in. Regulators may watch if infrastructure gatekeepers gain too much control over web access.
For Cloudflare, the strategic opportunity is to become a control plane for the agentic web. For site owners, the risk is becoming dependent on a few control planes. For AI companies, the risk is that access to the web becomes mediated by tollkeepers. For users, the risk is that disputes between platforms shape what agents can see.
The bot-majority moment is also an infrastructure-business moment. Whoever manages machine access will influence the web’s next economy.
The environmental and efficiency question
AI traffic has an efficiency problem. A human request can become hundreds or thousands of machine requests. If those requests duplicate work, hit dynamic pages, trigger heavy rendering or repeat across many AI companies, the web absorbs avoidable cost.
It is easy to overstate the environmental impact of web crawling relative to model training, inference and data-center expansion. A single page request is usually small. But billions or trillions of unnecessary requests are not nothing. They consume bandwidth, compute, storage, cooling and operational attention. They also cause sites to overprovision.
Efficiency should therefore become part of AI crawler ethics. Respectful bots should cache, deduplicate, honor conditional requests, use sitemaps, avoid repeated fetching, respect crawl-delay where appropriate and prefer structured feeds when available. Agents should avoid brute-force browsing when a site provides a clean API. AI companies should coordinate internal crawlers to prevent multiple product teams from hammering the same sources.
Sites should also improve efficiency. Bloated pages make machine access wasteful. Poor structure forces agents to parse unnecessary content. Dynamic rendering for static facts wastes compute. Bad status codes cause retries. Infinite URL spaces trap crawlers. Efficient web design helps humans, search engines and agents.
The environmental issue may become part of licensing. A publisher could offer a low-cost structured feed that reduces crawler load. A retailer could provide product deltas rather than forcing full recrawls. AI companies could pay less for efficient access and more for high-frequency dynamic queries. Infrastructure providers could expose carbon-aware or efficiency metrics for machine traffic.
The principle is not to shame all crawling. It is to reduce waste. When machines become the majority visitor, machine efficiency becomes web performance.
The human web becomes more valuable, not less
The more machine traffic grows, the more authentic human work matters. AI systems need original sources. Agents need trustworthy data. Users need judgment. Search and answer engines need reliable entities. Businesses need real customers. Communities need real members. A web of only synthetic content and bots would be useless.
This is the paradox of the bot-majority moment. Human clicks become a smaller share of visible traffic, but human-created value becomes more important. Original reporting, real product testing, verified reviews, expert documentation, local knowledge, scientific research, public records, community moderation and lived experience become the scarce inputs.
The market may not reward them automatically. That is the danger. A scarce input can still be underpaid if extraction is easy. The task now is to connect machine consumption to human production through licensing, attribution, referrals, direct payments, subscriptions, trust signals and legal protections.
For creators, the strategy should not be to imitate machines. It should be to become more clearly human where humanity adds value: judgment, sourcing, testing, accountability, taste, context, field experience, relationships and original evidence. AI can summarize commodity text. It cannot attend a local council meeting unless someone reports it. It cannot test a product unless someone does the testing. It cannot build trust from nowhere.
For platforms and AI companies, the responsibility is to preserve incentives. If answer engines consume the best human work while rewarding only the answer interface, the source layer weakens. If the source layer weakens, AI quality weakens. A stable system pays for the inputs it depends on.
For users, the habit should be source awareness. AI answers are convenient, but source quality matters. Clicking through sometimes matters. Paying for useful sources matters. Trusting an agent blindly is risky. The human web survives through human choices as well as technical policy.
Bots may now dominate parts of the traffic chart. They do not dominate the origin of meaning.
The next year will decide the rules of machine access
The period from mid-2026 to mid-2027 will likely set early norms for the agentic web. Several developments are likely.
First, more websites will separate AI crawler policies by purpose. Search indexing, AI training, answer retrieval and user agents will receive different treatment. Documentation from AI companies will keep changing as they formalize crawler roles.
Second, more infrastructure providers will offer AI bot dashboards and controls. Site owners will expect visibility into which AI systems access their content, what pages they consume and what actions they attempt.
Third, licensing and pay-per-crawl experiments will expand. The first pricing models will be crude, but they will create market reference points. Large publishers will negotiate directly. Smaller sites will look for collective tools.
Fourth, agent identity standards will become more urgent. Without trusted identity, useful agents will be blocked with bad bots. With trusted identity, sites can create safe pathways.
Fifth, security incidents involving agents will increase. Some will involve fraud. Some will involve prompt injection. Some will involve over-permissioned enterprise agents. These incidents will push companies toward stricter controls.
Sixth, AI answer visibility will become a formal marketing and communications discipline. Brands will track how they appear in AI systems, not only in search rankings. Publishers will track citations and retrieval, not only clicks.
Seventh, regulators will start asking more specific questions. Bot disclosure, content rights, competition, consumer transparency and data protection will move from abstract AI policy into concrete web-access debates.
The outcome is not predetermined. The web could become more closed and adversarial. It could also become more explicit and sustainable. The difference will depend on whether machine access is governed with enough nuance to preserve openness while preventing extraction and abuse.
The June 2026 milestone should be read as a deadline. The web has already changed. The rules are now catching up.
The strategic bottom line for the bot-majority internet
Cloudflare’s latest bot-traffic milestone is not just an odd statistic. It marks a shift in the web’s operating model. Machines now account for the majority of important open-web request traffic in Cloudflare’s reported HTML measurement. Other major reports show the same direction: automated traffic has overtaken or is outgrowing human traffic, AI-driven traffic is rising fast, and agentic browsing is expanding from reading into transactions.
The old web assumed that a page request usually meant a person or a search crawler that would help a person arrive. The new web cannot assume that. A request may be training, grounding, search retrieval, agentic comparison, fraud, monitoring, scraping, user delegation or abuse. The business value depends on which one it is.
For publishers, the central issue is compensation and attribution. For retailers, it is agent conversion versus fraud. For travel companies, it is comparison load versus bookings. For platforms, it is trust and visibility. For AI companies, it is access to high-quality sources. For users, it is convenience, transparency and safety. For regulators, it is market integrity.
The answer is not nostalgia for a human-only web. That web no longer exists and probably never fully did. Bots have been part of the internet for decades. The answer is a governed machine web that keeps human value at the center.
That means verified agents, purpose-specific crawler controls, stronger logs, better analytics, fair pricing, clearer permissions, safer delegation, structured access, source attribution and security controls built for machines that act. It also means editorial and commercial strategies that recognize AI systems as readers, intermediaries and potential customers without surrendering the economics of original work.
The internet’s next fight is not between humans and bots. It is between unpriced extraction and accountable machine participation. If the web gets that right, AI agents can reduce friction and improve access. If it gets it wrong, the open web becomes a costly backend for platforms that do not pay for what they consume.
The bot majority has arrived ahead of schedule. The question now is whether the web can build rules before the traffic chart becomes the business model.
Questions readers are asking about AI bots overtaking human traffic
Cloudflare CEO Matthew Prince said bots had passed human traffic online, and reporting on Cloudflare’s latest data put automated HTTP requests to HTML content at about 57.5% versus 42.5% for humans. The claim is strongest for measured web-page request traffic, not for every form of internet usage.
No. People still drive demand, create content, buy products and use apps. The shift means bots and agents now generate more of certain web request traffic, especially HTML page requests, while humans still dominate many long app sessions, streaming experiences and social interactions.
Agentic traffic comes from AI systems that can browse, compare, retrieve information and sometimes act on behalf of users. Unlike traditional crawlers, agents pursue tasks, such as comparing products, checking prices, planning travel or moving through online forms.
A search crawler mainly indexes pages for search discovery. An AI agent may use web pages to complete a user goal, such as shopping, booking, summarizing, troubleshooting or filling a workflow. The agent is more interactive and may generate many more requests.
A single human request can turn into hundreds or thousands of machine actions. An agent comparing flights, hotels or products may check far more pages than a person would manually open.
No. Search crawlers, uptime monitors, accessibility tools, archival bots and some AI retrieval agents can be useful. Bad bots include fraud tools, scrapers that violate terms, credential-stuffing systems, spam bots and DDoS traffic.
Publishers depend on human visits for ads, subscriptions and direct audience relationships. AI systems may consume their content, summarize it elsewhere and send fewer readers back, weakening the economic bargain that supported open publishing.
Retailers may receive more traffic from shopping agents, but they also face higher scraping, price monitoring, inventory abuse, fake account creation and payment fraud. The same automation that helps customers can also help attackers.
Robots.txt helps communicate preferences to compliant crawlers, but it is not an enforcement mechanism. Google’s own documentation says crawler compliance is up to the crawler, and research has found uneven compliance among scrapers.
Not automatically. Blocking all AI bots can reduce visibility in AI search and agentic discovery. A better strategy is to separate traffic by purpose: allow search, license or limit training, permit verified user agents, and block malicious automation.
Pay Per Crawl is Cloudflare’s system that lets domain owners choose whether to allow, charge or block AI crawlers. It treats machine access as a commercial decision rather than a free default.
Sometimes, but often not at the same level as traditional search. Cloudflare has documented crawl-to-click imbalances where some AI crawlers consume far more pages than they refer visitors back to.
They should combine CDN logs, server logs, bot-management tools and analytics data. Front-end analytics alone can miss bots that do not run scripts or can misclassify agents that behave differently from human sessions.
The biggest risk is that agents can act inside real workflows. They may be manipulated by malicious content, overuse privileges, test stolen credentials, automate fraud or perform actions a user did not intend.
SEO now has to consider humans, search crawlers and AI systems. Clear structure, source credibility, entity consistency, fresh facts and machine-readable content matter, but original value and access control matter just as much.
GEO, or generative engine optimization, means improving how a brand, source or product appears in AI-generated answers and agent recommendations. It focuses on facts, authority, citations, structured data and answer-engine visibility.
No, but they may reduce direct visits to many websites. Websites will still provide source material, transactions, trust, accounts, tools and original experiences. Agents may become intermediaries between users and those sites.
Yes. If AI traffic remains unpriced and uncontrolled, more sites may move content behind paywalls, logins, APIs or bot blockers. Controlled machine access is a better path than total closure.
AI companies should identify bots clearly, separate crawler purposes, respect site permissions, reduce duplicate crawling, provide webmaster tools, show useful citations and pay for high-value commercial content use.
Businesses should stop treating traffic as one category. They need to know which visitors are human, which are useful machines, which are extractive, and which are hostile. The winning strategy is governed access, not blind openness or blanket blocking.
Author:
Jan Bielik
CEO & Founder of Webiano Digital & Marketing Agency

This article is an original analysis supported by the sources cited below
Bot Traffic Worldwide | Cloudflare Radar
Cloudflare Radar’s public bot traffic page defines bot traffic as non-human internet traffic and provides the baseline framework for monitoring bot share across the web.
AI Insights | Cloudflare Radar
Cloudflare Radar’s AI Insights page tracks AI bot and crawler activity, including HTTP request trends and crawl-purpose categories.
‘Bots have now passed human traffic online,’ Cloudflare boss laments
Tom’s Hardware reported Cloudflare’s latest bot-versus-human HTTP request split and Matthew Prince’s June 2026 comments on the crossover arriving earlier than expected.
Internet has more AI bots than humans now, Cloudflare CEO makes unexpected announcement
India Today covered Cloudflare Radar’s bot-majority data and clarified the distinction between HTML web traffic and broader internet activity.
The 2026 State of AI Traffic & Cyberthreat Benchmark Report
HUMAN Security’s 2026 benchmark report supplied the 2025 growth figures for automated traffic, AI-driven traffic and agentic AI traffic.
Measuring the AI-Driven Internet with The 2026 State of AI Traffic & Cyberthreat Benchmark Report
HUMAN Security’s analysis explained how AI systems moved from reading the web to transacting on it, including agentic activity on checkout pages.
State of Agentic Traffic April 2026
HUMAN Security’s April 2026 agentic traffic update provided additional industry context for media, e-commerce and travel concentration.
2025 Bad Bot Report
Imperva’s 2025 Bad Bot Report supplied the finding that automated traffic reached 51% of web traffic in 2024 and that bad bots made up 37%.
Content Independence Day: no AI crawl without compensation
Cloudflare’s July 2025 announcement explained its shift toward blocking AI crawlers by default unless content owners permit or monetize access.
Introducing pay per crawl: Enabling content owners to charge AI crawlers for access
Cloudflare’s Pay Per Crawl announcement described the allow, charge and block options for AI crawler access.
From Googlebot to GPTBot: who’s crawling your site in 2025
Cloudflare’s crawler analysis supplied data on Googlebot, GPTBot, ChatGPT-User and PerplexityBot growth from May 2024 to May 2025.
The crawl-to-click gap: Cloudflare data on AI bots, training, and referrals
Cloudflare’s crawl-to-click analysis provided evidence on training-driven AI crawling, referral decline and crawl-to-refer imbalances.
A deeper look at AI crawlers: breaking down traffic by purpose and industry
Cloudflare’s AI crawler purpose and industry analysis supported the article’s distinction between training, search and user-action traffic.
Overview of OpenAI Crawlers
OpenAI’s official crawler documentation explains GPTBot, OAI-SearchBot and webmaster controls for different OpenAI web access purposes.
Does Anthropic crawl data from the web, and how can site owners block the crawler?
Anthropic’s crawler documentation explains ClaudeBot controls, robots.txt blocking and crawl-delay support.
Google’s common crawlers
Google’s crawler documentation explains Googlebot, GoogleOther, Google-Extended and other crawler tokens relevant to search and AI controls.
Robots.txt Introduction and Guide
Google Search Central’s robots.txt guide explains the purpose and limits of robots.txt as a crawler-management tool rather than an enforcement mechanism.
An update on web publisher controls
Google’s announcement introduced Google-Extended as a publisher control connected to AI model use.
Agentic AI threats and mitigations
OWASP’s agentic AI guidance provided the security context for autonomous systems that plan and act across tools and workflows.
OWASP Top 10 for Agentic Applications for 2026
OWASP’s 2026 agentic application risk framework supported the article’s discussion of agent-specific security and governance risks.
OWASP Top 10 for Large Language Model Applications
OWASP’s LLM risk framework supported the discussion of prompt injection, model denial of service and related AI application risks.
Scrapers selectively respect robots.txt directives
This 2025 academic study provided evidence that scraper compliance with robots.txt is uneven and that relying only on robots.txt can be risky.
FP-Agent: Fingerprinting AI Browsing Agents
This 2026 research paper supported the article’s discussion of behavioral fingerprinting and the difficulty of distinguishing AI browsing agents from humans.
Introducing Large Language Models as the Next Challenging Internet Traffic Source
This research paper supported the broader technical framing of LLMs and AI agents as a growing source of internet traffic.
Web Crawler Restrictions, AI Training Datasets & Political Biases
This study supported the discussion of crawler restrictions and the risk that uneven blocking patterns may shape AI training data.
Is Misinformation More Open? A Study of robots.txt Gatekeeping on the Web
This study supported the article’s warning that reputable sources may restrict AI crawlers more than misinformation sites, with implications for AI source quality.















