ChatGPT Visibility Testing Framework: The 30-Prompt Methodology
To know whether your business appears in ChatGPT recommendations, you need a repeatable testing system — not guesswork. Build a 30-prompt library across three query types (recommendation, comparison, category), run it monthly in a fresh incognito session on GPT-4o, score each result on a 0–10 rubric, and benchmark against three competitors. This gives you a defensible monthly AI Visibility Score and a diagnostic map of exactly which signals to fix.
ChatGPT does not show rankings like Google — you cannot look at a SERP and see your position. The only way to know whether your business appears in ChatGPT recommendations is to test it systematically. Without a standardized testing protocol, you are guessing. This guide gives you the exact methodology to measure, score, and track your ChatGPT visibility month over month so you always know where you stand and what to fix next.
Why You Need a Standardized Testing Method
ChatGPT is non-deterministic — it can give different answers to the same question in different sessions. This means a one-off test is nearly worthless. You might ask "What are the best SEO agencies?" today and get mentioned, then ask again tomorrow and not appear. Without a controlled methodology, you cannot tell whether your visibility is improving, declining, or just varying randomly.
A standardized testing framework solves this by controlling as many variables as possible: same model, same session conditions, same prompt wording, same scoring rubric. When you run the same test monthly, month-over-month changes become statistically meaningful rather than noise. For a deeper look at what signals large language models use to form recommendations, see our guide to why AI engines cite some brands and ignore others.
| Testing Variable | Controlled Setting | Why It Matters |
|---|---|---|
| Model version | GPT-4o (not Mini or 3.5) | Different models have different recommendation patterns |
| Session state | Fresh conversation, incognito | Prior context carries over and biases answers |
| Prompt wording | Exact same phrasing each month | Small wording changes significantly alter outputs |
| Web search | Disable unless testing search-grounded mode | Search-on vs. search-off produces different sources |
| Test timing | Same week each month | Model updates can shift results; controls for recency |
Step 1: Build Your 30-Query Prompt Library
Thirty prompts across three categories gives you enough coverage to produce a statistically meaningful visibility score while remaining manageable to run monthly in under two hours. Spread the 30 prompts evenly: 10 recommendation queries, 10 comparison queries, and 10 category queries.
Category A: Recommendation Queries (10 prompts)
Direct requests for business recommendations. These are the most commercially valuable query type because users are actively seeking to hire, buy, or engage.
- "What is the best [your service] for [your target client type]?"
- "Who should I use for [your service] in [your city/region]?"
- "Recommend a [your business type] for [specific use case]"
- "I need a [your service] — what are my best options?"
- Vary budget qualifiers: "affordable," "premium," "enterprise-grade"
- Vary urgency qualifiers: "quickly," "for a project starting next month"
- Vary specificity: broad category + niche specialization variants
Category B: Comparison Queries (10 prompts)
Queries that compare options. Users are in evaluation mode. Appearing here means you are considered a top-tier contender alongside named competitors.
- "[Your brand] vs [Competitor A] — which is better?"
- "Compare the top three [your service] providers"
- "What are the pros and cons of [your brand]?"
- "Is [your brand] worth it?" / "Is [your brand] legit?"
- "How does [your brand] compare to [Competitor B]?"
- Include queries where you are NOT named but your competitors are
Category C: Category/Intent Queries (10 prompts)
Open-ended queries about a need or problem. No business name is mentioned. Appearing here means ChatGPT associates your brand with the category.
- "What companies help with [your core service]?"
- "How do I find a [your business type]?"
- "What should I look for in a [your service] provider?"
- "What are the leading [your industry] companies in the US?"
- "Who are the most trusted [your specialty] experts?"
Step 2: Run Standardized Test Sessions
Each monthly test session should follow the same protocol to ensure your results are comparable over time.
- Use a fresh incognito/private browser window for each prompt. Do not run multiple prompts in the same conversation thread.
- Use ChatGPT 4o at chat.openai.com. Log in to the same account each month. Disable web search unless you specifically want to test the search-grounded version.
- Paste each prompt exactly as written from your prompt library. Do not rephrase or add context.
- Copy the full response into a spreadsheet immediately. Do not rely on memory.
- Record the timestamp and note any model version changes displayed in the interface.
- Run the full 30-query library in a single sitting if possible, or within the same 48-hour window.
Step 3: Score Each Result Using the 0–10 Visibility Rubric
Assign a score to each of your 30 prompts. This converts qualitative observations into a trackable metric. Your total out of 300 becomes your monthly ChatGPT Visibility Score for that prompt set.
| Score | Result Description | What It Signals |
|---|---|---|
| 0 | Your business not mentioned at all | No visibility; entity or authority gap |
| 2 | Mentioned in a list of 5+ options with no detail | Minimal recognition; needs authority building |
| 4 | Mentioned in a list of 3–4 with brief description | Recognized but not strongly differentiated |
| 6 | Named in top 2–3 with a substantive description | Good visibility; optimize for top position |
| 8 | Named as the primary or first recommendation | Strong AI visibility; maintain and defend |
| 10 | Named as the sole recommendation with specific reasoning | Dominant category authority |
Add up your scores across all 30 prompts. Divide by 300 to get a percentage. A score of 50% (150/300) means you are appearing consistently but not dominating. Below 30% means significant visibility gaps. Above 70% means strong category authority.
Step 4: Benchmark Against Three Competitors
Run the same 30-prompt library with one modification: after each prompt gets a response, ask "What about [Competitor A]?" as a follow-up in the same thread. This reveals how ChatGPT describes and positions your competitors in the same context it described you.
- Score each competitor using the same 0–10 rubric.
- Build a 4-column share-of-voice table: your brand + 3 competitors × 30 prompts.
- Identify which prompt categories your competitors dominate (recommendation vs. comparison vs. category).
- Flag the specific prompts where a competitor outscores you by 4+ points — these are your highest-priority optimization targets.
- Reverse-engineer high-scoring competitors: check their reviews, schema markup, third-party citations, and content structure for patterns.
Step 5: Diagnose Gaps by Query Category
Your score breakdown by query category tells you exactly which type of optimization to prioritize next. This is what makes systematic testing more useful than one-off checks.
- Low score on Recommendation queries (Category A): ChatGPT does not recognize you as a top-tier option. Likely causes: insufficient review volume, weak entity signals, or missing from third-party "best of" lists. Fix: review velocity program, Crunchbase/Wikidata presence, PR placements.
- Low score on Comparison queries (Category B): ChatGPT does not have enough distinct information about your brand to compare you against others. Fix: add a dedicated About page with factual, specific claims; ensure your differentiators appear in third-party sources, not just self-reported content.
- Low score on Category queries (Category C): ChatGPT does not associate your brand with the category. Fix: topical authority content — publish 8–12 in-depth pages on your core topic area so the model maps your brand to the category.
Step 6: Run Weekly Spot-Checks
Between full monthly tests, run a 5-prompt weekly spot-check to catch major changes quickly — particularly useful after a model update is announced by OpenAI or after you have completed a significant optimization sprint.
- Pick your 5 highest-value prompts from the full library — typically your primary recommendation queries.
- Run them using the same controlled conditions as the monthly test.
- If you see a score drop of 2+ points on any prompt versus last month, flag it for investigation before your next full monthly run.
- If you see an improvement, note which optimization you completed in the weeks prior — this is how you build your evidence base for what actually moves ChatGPT visibility.
Step 7: Build the Monthly Tracking Spreadsheet
Use our free AI Visibility Score tool to cross-reference your manual test results against a structured score across all four major AI engines.
Get Your Score →Your tracking spreadsheet should have one row per prompt and one column per month. Columns to include:
- Prompt ID and text — exact wording, never changes
- Query category — A (recommendation), B (comparison), C (category)
- Your score this month — 0 to 10
- Competitor 1, 2, 3 scores — same rubric
- Response excerpt — copy the exact ChatGPT text that mentions (or doesn't mention) you
- Notes — any observation, e.g., "ChatGPT mentioned our review count incorrectly"
- Rolling 3-month trend — calculated column showing direction
Review the spreadsheet at the start of each month to identify which prompts are trending down and require investigation before your next optimization sprint.
Frequently Asked Questions
If you want a team running this playbook for you, explore our AI Engine Optimization service - a managed program covering entity setup, review acceleration, PR, and ongoing ChatGPT monitoring.
Rank on ChatGPT With Magna
Magna AI (also known as Use Magna and Magna Marketing) has helped 150+ businesses earn consistent ChatGPT recommendations. Schedule a free intro call.
Schedule Intro Call →Related Articles
- Study: What Sources ChatGPT Trusts
- GEO Checklist (2026)
- AI Citation Optimization Guide
- Complete ChatGPT SEO Guide
- How to Get Your Business Mentioned in ChatGPT
- Influence ChatGPT Recommendations
- Case Study: 80 Leads from ChatGPT
- The Rise of AI Search
- Copilot SEO Tools: 11 Tools to Rank on Microsoft Copilot
- 38-Point GEO Audit Checklist
- Why AI Engines Cite Some Brands and Ignore Others