The AI
Visibility
Framework
Customer research now flows through AI tools before websites.
Why this framework exists.
In 2025, 94% of B2B buyers used AI in their purchase journey, and over half began with an AI chatbot instead of a Google search. For service firms, that means the crucial early stages, vendor shortlisting and cross-source validation, now happen without any direct contact with your team.
Most AI visibility advice in the market returns a single number. A percentage. A “GEO score”. A traffic light. That is too thin to act on.
This framework answers a sharper question: across five buyer-facing dimensions, can AI tools find your business, understand it, trust it, work with it, and cite it? Each dimension is scored against clear criteria, grounded in the evidence that matters. The result isn’t a marketing number, it’s a diagnosis your board can act on with confidence.
The five dimensions.
The AI Visibility Scorecard scores a business across Findability, Understandability, Trustworthiness, Agent-Readiness, and Citability. Each answers a practical question about how your business shows up for AI-driven buyers. Together, they map the full economic surface of AI visibility.
Each dimension is scored on a defined scale, applying the same criteria to your firm and three named competitors. That way, you see your position in context, not just an abstract number. The scoring criteria are held internally to protect the integrity of the process.
Findability
Dimension_01Does AI find your business when it should?
When a CFO asks ChatGPT for the best mid-market accountancy firms in Manchester, the real question isn’t whether you have a website, it’s whether AI tools actually name your firm in the answer. Findability is the upstream condition for everything else. If AI can’t find you, nothing else matters.
The firm appears by name in AI answers to relevant service queries against named competitors. Brand mentions appear consistently from one tool to the next.
A Bristol accountancy firm with weak Findability appears when asked by name, but is invisible when asked for the best mid-market firms in its sector. A firm with strong Findability appears in competitive shortlists across the tools tested, alongside its named peers.
Understandability
Dimension_02When AI describes you, does it get the description right?
Findability without accurate understanding is worse than being invisible. If AI surfaces your firm but gets your offer wrong, every onward referral starts from a false premise. Understandability measures how closely AI’s description matches what you actually do.
AI tools describe the firm’s services accurately, concisely, and consistently across tools. No contradictions between what ChatGPT says and what Gemini says.
A consultancy with weak Understandability has AI describe it as a generic technology consultancy when it is in fact a strategy specialist. A firm with strong Understandability has its core proposition described correctly, and the only gap is a missed specialism.
Trustworthiness
Dimension_03Does AI treat your firm as a credible source?
AI tools weight signals. A firm that appears in independent comparison articles ranks higher than one that only talks about itself. Trustworthiness is about the strength of those third-party signals AI uses to decide whether to recommend you.
Multiple independent third-party signals reinforce expertise. Named case studies. Press citations. Recognisable client logos. Published research. Third-party reviews. AI tools surface these signals when describing the firm.
A law firm with weak Trustworthiness has client logos on the homepage and a Trustpilot widget. A firm with strong Trustworthiness has named case studies with measurable outcomes, press citations from recognisable publications, and published research in its specialism. The second is the firm AI is more likely to recommend in a competitive shortlist.
Agent-Readiness
Dimension_04Can autonomous AI agents work with your site?
The next wave of AI visibility isn’t about search, it’s about action. I’ve watched as agents start browsing sites, filling out forms, requesting quotes, and booking consultations for buyers. Agent-Readiness measures whether your site is technically ready for that kind of traversal.
Clean structured data (Schema.org JSON-LD covering the organisation, services, and key actions). No critical content hidden behind JavaScript. Fast time-to-first-byte (how quickly the server starts responding). Stable, accessible forms. Error states an agent can recover from. Agent crawlers are not blocked.
An agency with weak Agent-Readiness has its quote-request form behind a JavaScript-rendered overlay (a pop-up that only loads once scripts run) an agent cannot interact with, plus no Service or Organisation schema. A firm with strong Agent-Readiness has the form accessible in raw HTML, schema covering the firm and its key services, and the only friction is a slow third-party widget on one page.
Citability
Dimension_05When AI generates an answer, does it quote your content?
Citability is what turns visibility into real traffic. Even if AI tools find and trust your firm, they still need a reason to quote you. That comes down to structure, pages built so a language model can extract claims, attribute them, and link back.
Content structured for citation. Clear claim-evidence pairs. Named authors with credentials. Publication dates. Scannable headings. A published llms.txt file (a small text file that points AI tools to the site’s most quotable content).
An accountancy firm with weak Citability publishes weekly blog posts in dense paragraphs with no headings, no author attribution, and no dates. A firm with strong Citability publishes the same content with named author bylines, clear sub-claims, and concrete data points AI can extract. The second gets cited. The first does not.
How the live AI tool tests work.
Three AI tools: ChatGPT, Perplexity, Gemini. Together they cover the dominant share of B2B AI buyer behaviour. Adding Claude and Copilot widens coverage without changing what you would fix first. The Scorecard is optimised for depth of interpretation across the tools that matter most, not breadth across all of them.
A standardised prompt set runs against your firm and three named competitors we agree on at kick-off. The prompts cover brand, category, and intent queries. I record responses verbatim with screenshots, then benchmark them against the competitor set.
The competitor benchmark turns each dimension into a real-world position, not just an abstract score. The same result means something different against a weak competitor than it does against a strong one. That’s why the Scorecard shows both your score and how you stack up against the three named firms.
The exact prompt set, run-counts, controls for variation between the tools, and the scoring criteria are not published. The diagnosis is more defensible when the criteria cannot be reverse-engineered into a self-graded checklist. Clients see the full criteria applied to their business in the Scorecard delivery.
The AX framework lineage.
Under the hood, these five buyer-facing dimensions sit on top of an internal scoring framework I call AX (Agent Experience). AX covers five technical areas: structural accessibility, functional success against agent tasks, performance benchmarks, content quality for AI, and error recovery. Each area is weighted by operational consequence. For example, a form an agent can’t complete or a schema gap that blocks retrieval matters far more than a cosmetic issue.
The buyer dimensions aren’t a one-to-one mapping to the technical framework. Findability draws on structural accessibility and content quality. Trustworthiness pulls from content quality and schema. Agent-Readiness spans functional success, performance, and error recovery. The dimensions are what your board sees; AX is how I produce the audit. I keep the weights internal for the same reason as the scoring criteria, they’re what competitors would copy.
Common questions.
What if AI answers change?
They will, and the Scorecard is built for that. It does not rest on a single prompt or a single answer. It tests across multiple AI tools, sets your firm against named competitors, reviews your site and content, and reports the pattern rather than the snapshot. The fixes that follow hold as the answers move.
Book the Scorecard.
What you see here is the diagnostic framework. The Scorecard is how I apply it to your business: live tests across three AI tools, benchmarking against three named competitors, and a hands-on audit of your 10 to 20 most important pages, scored against this framework, delivered as a 10 to 15 page PDF within 10 working days. £2,500, fixed scope, fixed price. No upsell, just the insight you need.