FIND Layer April 12, 2026 By Liz Micik

AI Searchability is Binary: Can AI Crawlers FIND you?

The short version:

  • AI visibility at the FIND layer is binary: either AI platforms can access and read your content, or they can't. There is no middle ground.
  • 35% of professional services sites unknowingly block AI crawlers through firewall rules, robots.txt, JavaScript use or lack of structured data.

FIND is Not About Rankings

When we introduced the Find/Understand/Trust framework in our first pillar post, "From Search to AI Visibility: The Transition Needs Translation," we planted a flag: AI visibility requires a new technical infrastructure. That post diagnosed the problem (search engines and AI platforms are fundamentally different). This post goes deep on the first layer of that infrastructure: FIND.

In 2026, we passed an Internet milestone. 51% of all web traffic is now bots, not humans. Let that sink in a minute. More than half of your potential web visitors are not human.

FIND answers one question: Can AI crawlers access and read your website? It's simple and binary. You're either findable, or you've shut the door on a little more than half of your potential visitors.

There's no "partially findable" or "findable for some AI platforms." Your site either responds to AI crawlers, renders your content, and speaks the language these platforms understand (structured data), or it doesn't.

The only real question here is whether you consciously made the choice to exclude your site from AI searches or not.

We studied 120 B2B professional services firms with annual revenues between 10 million and 500 million dollars. Here's what we found:

That leaves just 65% of our sample findable by AI platforms. They are the firms that have a chance to win citations and mentions -- the equivalents of search engine rankings. The 35% of our sample above have already lost their chance to be seen and their chance to earn.

The Four Gates of FIND

Think of AI searchability at the FIND layer as a four-gate system. Your content has to pass all four.

CDN / Edge Bot Management Request never reaches your server WAF / Firewall Rules Crawler gets 403 or timeout JavaScript Rendering Bot sees empty page Structured Data AI can't understand your content FINDABLE INVISIBLE

Fail any gate and you're invisible.

Gate 1: CDN / Edge Bot Management (The Gate You Don't Control)

The first decision about whether an AI crawler reaches your content isn't made by your leadership team. Nor is it made on your website. It's made by your CDN.

Content Delivery Networks like Cloudflare, Akamai, and AWS CloudFront sit at the network edge. Every request to your domain hits the CDN before it reaches your origin server. And every major CDN now includes bot management features that can accept or reject AI crawlers at the edge.

Cloudflare's Bot Fight Mode is the most common example. It's a toggle in the dashboard. When enabled, it challenges or blocks requests from bots it classifies as automated. GPTBot, ClaudeBot, Perplexitybot: these can all get caught. Cloudflare also offers a separate "AI Scrapers and Crawlers" setting that lets site owners explicitly allow or block known AI crawlers with a yes/no toggle.

Akamai's Bot Manager and AWS CloudFront's AWS WAF integration work similarly. They classify incoming traffic and apply rules before the request reaches your server. If the CDN decides a request is a bot and the rule says "block," your server never sees it. Your analytics never record it. You have no idea it happened.

This matters for a specific reason: CDN bot rules are often set by your infrastructure or DevOps team, not your marketing team. The person who toggled Cloudflare's Bot Fight Mode probably wasn't thinking about AI visibility. They were thinking about security, scraping prevention, or cost reduction (bots drive bandwidth). The marketing implications weren't part of the conversation.

From our 120-company research, the 26 companies with accidental bot blocks split between CDN-level and WAF-level blocks. We couldn't always determine which layer was responsible, because from the outside the result is the same: a 403 Forbidden or a timeout. But the fix depends entirely on which layer is doing the blocking.

How to Check Where You Stand

Log into your CDN dashboard.

In each case, look for whether known AI crawler user agents (GPTBot, ClaudeBot, Perplexitybot, Googlebot-Extended) are explicitly allowed, challenged, or blocked.

If you don't know which CDN your site uses, ask your infrastructure team. If you don't have an infrastructure team, run your domain through a DNS lookup: the nameservers or CNAME records will typically reveal the CDN provider.

Gate 2: WAF / Robots.txt (The Accidental Block?)

The CDN makes a broad bot classification decision at the edge. If an AI crawler passes your CDN's edge rules, there are still two barriers they must pass before they reach the actual text on your page: the Web Application Firewall (WAF) and your robots.txt file.

The WAF Block

A WAF sits behind the CDN, protecting your origin server. The WAF applies more granular rules to stop attacks: SQL injection, cross-site scripting, credential stuffing, DDoS patterns. When a WAF is tuned aggressively, it can treat an AI crawler's request like a threat and return a 403 Forbidden or 429 Too Many Requests response.

Twenty-six companies in our sample block AI crawlers through WAF or CDN rules. They may or may not even know that they're doing it.

Someone on your infrastructure team may occasionally scan the logs and see that AI crawlers are being blocked. From any other perspective, the action is invisible. Your website works fine when you browse it. Analytics show normal traffic. There's no error log because the block intercepts the request before it reaches your application.

The question is: did you mean to do that? Are you knowingly and deliberately cutting yourself off from organic traffic that could lead to conversions?

The Robots.txt Block

Four companies in our research explicitly prevent AI crawlers from accessing their content. They do this through robots.txt directives that tell search engine and AI crawlers what areas of their site they can and cannot access.

In the past decade, the rise of "bad bots" intent on scraping sites caused many businesses to become more aggressive in their counter attacks. So these four companies may have been like many others who chose to adopt a "block all bots" posture to protect their data, their brand and their server costs.

ChatGPT's arrival at the end of 2023 changed this dynamic forever. In addition to bad content-scraping bots, many companies tried to protect themselves by blocking AI crawlers to keep them from training new LLM models on their proprietary data.

But now that almost all AI models have the capability to do web searches, they can train themselves on any data they find on the Internet. And remember, the bot side of your traffic equation is at 51% and rising.

The question now is: Do you still want to block them all? How high is the opportunity cost?

How to Check Where You Stand

For WAF: use the official crawlers' IP ranges and test them against your firewall rules. Google publishes the IP blocks for Googlebot and Googlebot-Extended. OpenAI publishes ranges for GPTBot. Anthropic publishes ranges for ClaudeBot. Check whether requests from those ranges hit a 200 (success), 403 (blocked), or timeout. If you see anything other than 200, you have a firewall problem. Also check your WAF logs directly for 403s or 429s from AI crawler IP ranges.

For Robots.txt: Check to see if you are explicitly excluding bots by name. Your robots.txt file may include lines like this:

Example: robots.txt AI crawler blocks
# Block OpenAI (ChatGPT and data crawler)
User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

# Block Anthropic (Claude training and searching)
User-agent: ClaudeBot
Disallow: /

User-agent: Claude-User
Disallow: /

User-agent: Claude-SearchBot
Disallow: /

# Block Common Crawl (used by many AIs)
User-agent: CCBot
Disallow: /

If you want to "test out" allowing AI bots to visit your site, you may choose to allow bots for the AI platforms you recognize. While CCBot is not a name anyone would recognize, this is the "common crawl" bot representing the index most LLM models are trained on. If you aren't visible there, your company does not exist to AI.

Gate 3: JavaScript Rendering (The Invisible Majority Problem)

Here's the hardest part of FIND for most marketing leaders to accept: four out of six major AI crawlers cannot execute JavaScript.

JavaScript execution is expensive. It's slow. It's resource-intensive. When you're crawling billions of pages, you batch what you can fetch without execution. So most AI crawlers take the shortcut: fetch the HTML file, parse it, move on.

This creates a crisis for any site that renders content with JavaScript.

Vercel, the platform behind Next.js, published data in 2024 confirming that across 500 million GPTBot fetches, they observed zero JavaScript execution. GPTBot fetches the static HTML and moves on. If the page content lives in JavaScript, the bot sees nothing.

What humans see vs. what AI crawlers see:

A user with a browser visits your homepage. The browser downloads the HTML, JavaScript, CSS, and images. JavaScript executes. Your content appears. The user reads your value prop, your case studies, your credentials.

An AI crawler visits your homepage. The crawler downloads the HTML. That HTML is mostly empty. The crawler sees no content. No value prop. No case studies. No context. The crawler indexes a blank page or moves to the next URL.

yourcompany.com Logo Transform Your Business Unlock growth with AI-powered strategies Get Started Our Services Strategy Align vision with market demand Analytics Measure what matters most Growth Scale sustainably and profitably Why Clients Trust Us "Transformed our entire approach to digital visibility" -- Sarah Chen, VP Marketing 2026 AI Consulting. All rights reserved. What Humans See What AI Crawlers See Rich content Empty HTML yourcompany.com <div id="app"> <noscript> Enable JavaScript... </noscript> </div> 4 out of 6 AI crawlers see nothing here. JavaScript-rendered content is invisible to AI platforms that don't execute JS.

A JavaScript-heavy website as seen by a human browser (left) versus the near-empty HTML skeleton seen by most AI crawlers (right). Based on lizmicik.com analysis of 120 B2B professional services firms.

From the AI platform's perspective, your homepage doesn't contain useful information. It's not rankable. It's not citable. It's invisible.

How to Check Where You Stand

Crawl your website with JavaScript disabled. Are you happy with what you see? Can your business be understood?

Gate 4: Structured Data (The Language AI Platforms Actually Speak)

If a crawler can reach your page and see your content, the next question is: does the crawler understand what it's reading?

Humans read: "Our AI Visibility Audit costs $2,500, takes 3 days, and includes a 30-minute strategy call."

An AI crawler sees plain text. Without structured data, it has to guess: is that a product? A service? A price? A duration? A feature?

Structured data (schema markup in JSON-LD format, RDFa, or microdata) is the answer. It's metadata that explains, in a machine-readable format, what your content is and what attributes it has.

Here's the same sentence with schema markup:

Example: JSON-LD schema for a service
{
  "@context": "https://schema.org",
  "@type": "Service",
  "name": "AI Visibility Audit",
  "price": {
    "@type": "PriceSpecification",
    "priceCurrency": "USD",
    "price": "2500"
  },
  "duration": "P3D",
  "includes": {
    "@type": "Thing",
    "name": "30-minute strategy call"
  }
}

Now the AI crawler understands: there's a service, it costs 2,500 dollars, it takes 3 days, and it includes a 30-minute call. The crawler can index these attributes, use them for retrieval, and cite the source accurately.

Fabrice Canel, a Microsoft engineer who leads Bing's LLM integration, explained the implications at SMX Munich in 2025 when he said, "Schema markup directly feeds Bing's LLMs."

Schema markup does so much more than earn you rich snippets. It connects your content to the knowledge graph. It tells the AI system what category your business falls into, what you do, how much it costs, how long it takes, who your customers are, and what outcomes you deliver.

This structured data becomes part of the source material that LLMs see when they're generating answers. Here's what our research across 120 B2B professional services firms found:

Schema Markup Adoption by Type 120 B2B Professional Services Companies 0% 25% 50% 75% 100% Organization 70% Service / Product 55% Article 45% LocalBusiness 20% Author 15% Rating 12% HowTo 10% Review 8% FAQPage <1% Only 1 company, despite FAQ content on many sites Based on lizmicik.com research of 120 B2B professional services companies

Schema markup adoption by type across 120 B2B professional services firms. Relationship markup (Author, Rating, Review) and content-specific markup (FAQPage, HowTo) are severely underutilized.

The data speaks to a gap: most professional services firms haven't translated their content into the structured format that AI platforms need to understand it.

I have a suspicion that this one gate is going to cause more talk and raise more debate than any other. Schema markup has been around since 2011. Yet adoption remains mixed.

I'm going to make a longer-term prediction here and say that schema may NOT be important -- or as important -- in three to five years as it is now. I think as AI agents get smarter they won't need the "structure" quite as much, but the "data" that's contained in those structures will always be key to their understanding of what you do.

Today though, they have a problem even parsing your content to find the bits and pieces that are important. If you don't help make their job easier, they may just fill in their missing gaps with guesses. And we know what an expensive mistake that is.

How to Check Where You Stand

Audit your site with Google's Schema Markup Validator or a tool like Screaming Frog's schema analyzer. Identify which pages have schema markup, which types are present, and which are missing.

Why FIND Feeds Everything Else

Before we wrap, it's important to understand why FIND is foundational to the whole AI visibility framework.

The Find/Understand/Trust stool has three legs. FIND is the first one: the technical layer that answers, "Can they access and read you?"

If you fail FIND, UNDERSTAND and TRUST become irrelevant. A crawler that can't reach your page, can't see your content, or can't parse your structured data will never get to the point of ranking your content (UNDERSTAND) or assessing your credibility (TRUST).

Here's the upside: FIND is the most controllable layer. You own your robots.txt. You own your infrastructure. You own your schema markup. You can configure your CDN's bot rules.

A poor UNDERSTAND ranking (where the AI system ranks you lower than a competitor) can be more complex to fix. It affects how you present yourself to humans as well as AI agents and involves multiple teams working together.

A poor TRUST score (where the AI system weighs your credibility lower) takes time. But a FIND failure? That's fixable in days. Some gates (like CDN settings) take minutes once you know where to look.

The Findable Advantage: What Happens When You Pass FIND

Companies that nail FIND see a different trajectory in AI visibility. They appear in more citations. They get referred higher-quality traffic.

GoodFirms studied 7,000+ citations across AI platforms and found that content updated within 30 days gets 3.2x more AI citations than content older than 90 days. That stat matters only if you're findable. An old, unfindable page gets zero citations. A fresh, findable page gets cited more often.

On conversion, the numbers are compelling. AI-referred traffic converts at 14.2 percent, compared to 2.8 percent from Google Search, according to GoodFirms' same research. That's a 5x difference. But you don't get that advantage unless your content is findable first.

Sixty percent of Google searches now end without a click to a website. Those users get their answer from an AI Overview. If you're findable, your content can feed that Overview. If you're not, your competitor's does.

The FIND layer is the prerequisite. Master it, and the rest of the framework becomes actionable.

Next Steps: From FIND to UNDERSTAND

Once an AI agent has made it through all four FIND gates on your site, the next question is harder: can it actually parse your content well enough to understand what you are and what you do?

Our Signal Check tests the UNDERSTAND layer.

It's free and takes only two minutes to find out exactly what all four major AI platforms say about you right now.

Run your free Signal Check now

LM

Liz Micik

SEO & Content Strategist | AI Visibility Consultant

Liz has spent 28 years helping companies navigate structural shifts in how the web works. From the rise of Google to mobile-first indexing to today's AI-mediated search, she works with B2B professional services firms to ensure they're visible where their buyers are looking.

You're findable. But does AI understand what you do?

The Signal Check shows you exactly what AI platforms extract from your site. The full Audit covers all three layers.

Free Signal Check Agent Readiness Audit