You Can't Manage What You Can't Measure About AI, or Can You?

The Short Version

89% of brands appear in AI search answers. Only 14% of marketers track what those answers say. The gap is real, but the tools alone aren't the reason.
The most valuable AI visibility research of the past year came from SEO community leaders running public studies, not from proprietary tool dashboards.
"You can't manage what you can't measure" (wrongly attributed to Drucker) is only half the equation. You can't measure what you haven't built to be measurable.

The number driving this year's AI visibility coverage is a 75-point gap. A GoodFirms study released April 7, 2026 reports that 89 percent of brands appear in AI-generated search answers but only 14 percent of marketers track what those answers say. The easy read is that marketing teams are asleep at the wheel. The honest read is more complicated than that (surprise!).

There are at least two sides to the answer. One has to do with the maturity of the AI measurement tools themselves. The other is that "we" (marketing, business leaders, analysts, et al.) don't really know where to start. The measurement discipline is being built in real time, most teams are waiting for it to settle, and the tools alone aren't what will close the gap.

There are some really neat analogies that explain where we are in the transition from the human web to the agentic web. One of my favorites is that "we're building the airplane as we're flying there." And we may not know where there is yet, but we definitely know we're flying.

This is part one of a three part look at how we are measuring what's happening to us in transition. Whenever you say "measurement" a lot of the same marketers, business leaders, and analysts who aren't tracking their companies' citations and mentions today will immediately want to know what tool they should use or what metrics are emerging as KPIs in this new landscape.

This article is the strategic frame. Part two does the inventory work on the free and freemium AI visibility checkers: 26 tools label-read against a four-question transparency test, and the short list of the ones that actually measure what they claim to. Part three walks through the paid stack and shows what a working toolset looks like for the two buyer profiles that cover most of the B2B landscape. Read this piece when you need to know why the gap exists. Read the next two when you need to know what to do about it.

The Tools Are Not the Whole Story

It's no surprise that when I read my briefing every day, I learn of a new feature being added to a model. I also read about a new tip, trick or tool designed to help us figure out where and how we're showing up in AI answers. The measurement tools, unfortunately, are lagging behind the features by quite a bit.

Every AI visibility tool shipping in 2026 falls into one of two families. Both are useful. Neither is a census.

Synthetic-prompt tools

What they do: Run fabricated prompt sets against AI platforms and report how often a brand appears in the responses.
Examples: Gumshoe, Profound, Otterly, Peec, LLM Pulse
What they can't see: Real buyer behavior. The prompt set is whatever the analyst chose. A different set produces a different picture.
Best for: Tracking your share of voice across a stable, repeatable prompt set over time.

Real-world signal tools

What they do: Measure observable behavior: actual citations, crawler activity, referral traffic.
Examples: Ahrefs Brand Radar, Semrush AI Visibility Toolkit, BrightEdge, Conductor, seoClarity
What they can't see: Brand exposure inside an AI conversation that didn't produce a click — which is most of it.
Best for: Tracking citations and traffic that AI is actually sending you in the wild.

The synthetic tools are measuring something the analyst constructed. No real buyer typed those prompts; the prompt set is a sample chosen by whoever set up the account. A different set would produce a different picture.

The real-world tools have the opposite limitation. They can see citations that appeared in live AI Overviews their scrapers observed, crawlers that reached your site, referral traffic that arrived with an identifiable header. What they cannot see is brand exposure inside an AI conversation that did not produce a click, which is most of it.

If a user asks ChatGPT a question, gets your brand in the answer, and walks away satisfied without clicking through, no real-world signal tool in the category will register that interaction. It is not a bug. It is the underlying privacy structure of how AI platforms handle user journey data.

Both families are useful directionally. Neither is delivering the complete signal anyone is selling them as.

The Community Is Doing the Real Methodology Work

The most valuable AI visibility insights of the past year did not come from proprietary platforms. They came from practitioners running public studies, publishing their methodology, and inviting other operators to replicate or extend the work. This is worth naming directly because it changes how to read every number in the category.

Rand Fishkin and Patrick O'Donnell published the largest public test of AI recommendation consistency in early 2026. The study ran twelve prompts (brand recommendations across categories: chef's knives, headphones, cancer hospitals, digital marketing consultants, science fiction novels, and others) for a total of 2,961 test runs across ChatGPT, Claude, and Google's generative search systems.

The findings landed hard. The probability of two responses mentioning the same brands in the same order was less than one in one thousand. The probability of the same list of brands in any order was less than one in one hundred. Narrow queries (Los Angeles Volvo dealers) showed high consistency while broad queries (science fiction novels, design agencies) scattered.

"Any tool that gives a 'ranking position in AI' is full of baloney."

— Rand Fishkin, SparkToro

That study changed how serious practitioners read synthetic-prompt tool reports. Position-based metrics should be discounted. Frequency-based share of voice across a stable prompt set, run many times, is the honest read. Every tool in the synthetic-prompt family now has to answer for its methodology against that finding.

Chris Long, one of the most prolific tactical publishers on LinkedIn, ran a parallel study in collaboration with AirOps. The joint analysis covered 815,000 query-page pairs across 16,851 queries and ten industries. The headline finding is that retrieval rank (where a page appears in the search results that AI platforms use to build their answers) is the number-one signal for whether that page gets cited.

Position 1 in the underlying search gets cited in AI responses 58 percent of the time. Pages covering roughly a quarter to half of the fan-out sub-queries for a topic get cited more often than pages covering 100 percent of them, a depth-over-breadth finding that contradicts most of what content strategy playbooks currently recommend.

"Domain authority does not predict citation frequency. Content quality does."

— Chris Long, Go Fish Digital

Long's LinkedIn posting practice is also the model for how this research is reaching the people who can put it into practice. Short tactical posts showing exactly how to pull AI Overview data, how to surface ChatGPT's hidden search queries, how to run gap analyses between AI Overview content and page content. They are bite-sized morsels a marketing analyst can read one on Tuesday and ship in new data by Wednesday.

AirOps' 2026 State of AI Search Report verified and expanded on the community results, reporting:

Only 30 percent of brands stay visible from one AI answer to the next
Only 20 percent remain visible across five consecutive runs of the same prompt
Pages not updated in more than three months are three times more likely to lose visibility
48% of AI citations come from community platforms (Reddit, YouTube, forums) and 85 percent of brand mentions come from third-party pages, not from the brand's own domain.

Wil Reynolds' recent work adds the strongest test of brand consistency at the edge of consumer decision-making. I first heard about this particular study when Reynolds announced it in a LinkedIn post. He ran a study with 200 marketers entering the same prompt for "best banks for SBA loans." What he found was that 60% of the marketers got the same three brands in the ChatGPT response.

What Fuels AI Consistency?

That is a striking consistency finding on its own, but when you consider it next to Fishkin's broader result it raises a very interesting question: when does AI consistency hold and when does it break down?

The working answer involves many of the things I consider when I'm doing an AI agent audit or readiness roadmap for a client. When a complete set of facts is presented through a clear structure and brand messaging is consistent across multiple web platforms, AI's consistency holds much better as well. The three brands in the Seer study are obviously doing the work that made their content clear, complete, and consistent enough to show up over and over.

The much larger sample of sites that Fishkin and AirOps saw remind us of just how few companies have done the work to make their information clear, complete, and consistent enough to be statistically verifiable.

There's one more thing about all these studies that adds up to more than any individual number. We've seen a real willingness in the SEO community to share and to help each other learn as we go through this transition that I, for one, am really proud of.

This is us, the SEO community, flying the plane while we're soldering on flaps and rudders and (I hope) landing gear. It is the most productive thing happening in AI visibility measurement right now.

This research is also important for another, even more practical reason. Each finding by each one of these community giants carries its own blueprint for how a marketing team should plan content and expect to measure it.

The Bidirectional Frame

"You can't manage what you can't measure" is one of the most frequently repeated business aphorisms of the last thirty years. It is also almost certainly misattributed. The Drucker Institute has stated that Peter Drucker never wrote those exact words.

W. Edwards Deming, who is also sometimes credited with the quote, actually argued against the sentiment in his 1993 book, warning that "it is wrong to suppose that if you can't measure it, you can't manage it. A costly myth."

Deming's correction is the point. The aphorism is useful but only half-true. The other half is that you can't measure what you can't manage. Measurement and management are locked together.

You can manage every aspect of your website. To date we've "managed" and "optimized" our websites to provide our information in the entertaining and interactive ways that humans like to ingest information. Javascript interactions and videos are examples.

Now we need to shift our focus and begin "managing" the structured data of our site and "optimizing" the ways it can be ingested by AI agents. If your site has no schema, no entity clarity, no structural consistency across pages, there is nothing for an agentic measurement tool to stabilize against.

This is why the measurement gap matters operationally. It is not just that marketers need better tools. It is that marketers need better tools and a website foundation that is measurable. The two have to happen together.

Quick Wins vs Foundational Wins

All of the studies we've talked about today expose exactly the kind of foundational work we all need to do to "optimize" our sites for AI. All too often the foundational work needed that's been exposed over and over by our SEO leaders' studies is skipped over or deferred. Today's marketers and business leaders are being squeezed to come up with quick wins this quarter. They're looking for tools that can show them the low-hanging fruit they can pluck right now.

That's what the next two articles cover. Part two is about the free-audit landscape: 26 tools label-read against four transparency questions, the six that scored 4/4, and the large cluster that presents a confident score with no visible methodology. Part three is about the paid stack, when it's worth graduating from free tools, and what a working toolset looks like for two very different buyer profiles.

Common Questions

What should I do when the same prompt produces different brands every time I run it?

Large language models are based on likelihoods and probabilities, not guaranteed certainties. Fishkin and O'Donnell's 2,961-run study found less than a one-in-one-hundred chance that the same prompt would return the same list of brands across runs, and less than one in one thousand chance of the same list in the same order. This is not a flaw in the tools measuring AI visibility; it is the underlying mechanism of the systems being measured. What you should do is track frequency across many runs on the same prompt set over time. Do not trust a single response as a stable reading.

How long until AI visibility measurement matures?

My estimate, based on watching SEO measurement mature between 2003 and 2010, is twelve to twenty-four months before there is anything resembling a stable cross-platform standard for AI visibility. It will probably be longer before it is as widely understood as Google Search Console is today. The intervening period is the window in which foundation work produces the largest competitive advantage, because the brands doing it will not have to convince anyone the work mattered. The measurement, when it arrives, will show.

If the measurement is incomplete, why invest in AI visibility at all?

Because the early data on AI visitor conversion is not ambiguous. The Opollo 2026 AI Search Benchmark Report, which analyzed 312 B2B technology firms, put AI visitor conversion at 14.2 percent against 2.8 percent for Google organic, a 5x multiple. The likely reason is selection: a buyer arriving after a long AI conversation has worked through more of their evaluation than a buyer who clicked a Google result. The multiple will compress as AI traffic volume grows, but it has not compressed yet. The brands that capture this traffic while the pre-qualified-buyer effect holds are converting it at rates the old benchmarks cannot match.

Free · 5 minutes

Signal Check

A free five-minute test that runs your URL and a target keyword across four AI models and returns Fact and Vibe scores plus a benchmark against your industry. A directional reading of where your business stands today.

Run a Signal Check

Full diagnostic

Agent Readiness Audit

A comprehensive assessment across all four layers (Find, Understand, Trust, and Use) with an interactive report showing exactly where your structural foundation is leaking and what to fix first.

See the Audit

Liz Micik

AI Visibility Strategist & SEO Expert

28 years in SEO, content strategy, and B2B marketing. Liz helps professional services firms become visible, understood, and trusted by AI platforms.