What content works well in LLMs?

A new study of brand mentions and citations in Chat GPT & Co

Mar 17, 2025

Learning session: Cracking brand consistency

Maintaining consistency at every touchpoint as a brand grows is non-negotiable. Watch our session—Beyond the Logo: cracking brand consistency and learn how to:

Create a brand strategy that gets buy-in
Identify must-have elements of an effective brand strategy
Execute your strategy seamlessly with the correct tools

Watch now

Over the last 12 months, we filled significant gaps in our understanding of AI Chatbots like Chat GPT & Co.

We know:

Adoption is growing rapidly
AI Chatbots send more referrals to websites over time
Referral traffic from AI Chatbots has a higher quality than from Google

You can read all about it in the state of AI Chatbots and SEO.

But there isn’t much content about examples and success factors of content that drives citations and mentions in AI Chatbots.

To get an answer, I analyzed over 7,000 citations across 1,600 URLs to content-heavy sites (think: Integrators) in # AI Chatbots (Chat GPT, Perplexity, AI Overviews) in February 2024 with the help of Profound.

My goal is to figure out:

Why some pages are more cited than others, so we can optimize content for AI Chatbots
Whether classic SEO factors matter for AI Chatbot visibility, so we can prioritize
What traps to avoid, so we don’t have to learn the same lessons many times
If different factors influence mentions and citations, so we can be more targeted in our efforts

Here are my findings:

The key to brand citation in AI Chatbots: Deep Content

🔍Context: We know that AI Chatbots use RAG (Retrieval Augmented Generation) to weigh their answers with results from Google and Bing. However, does that mean classic SEO ranking factors also translate to AI Chatbot citations? No.

My correlation analysis shows that none of the classic SEO metrics have strong relationships with citations. LLMs have light preferences: Perplexity and in AIOs weigh word and sentence count higher. Chat GPT weighs domain rating and Flesch Score.

💡Takeaway: Classic SEO metrics don’t matter nearly as much for AI Chatbot mentions and citations. The best thing you can do for content optimization is to aim for depth, comprehensiveness and readability (how easy the text is to understand).

The following examples all demonstrate those attributes:

https://www.byrdie.com/digital-prescription-services-dermatologist-5179537
https://www.healthline.com/nutrition/best-weight-loss-programs
https://www.verywellmind.com/we-tried-online-therapy-com-these-were-our-experiences-8780086

Broad correlations didn’t reveal enough meat on the bone and left me with too many open questions. So, I looked at what the most-cited content does differently than the rest. That approach showed much stronger patterns.

🔍Context: Because I didn’t get much out of statistical correlations, I wanted to see how the top 10% of most cited content stacks up against the bottom 90%. The bigger the difference, the more critical the factor for the top 10%. In other words, the multiplier (x-axis on the chart) indicates what factors LLMs reward with citations.

The results:

The two factors that stand out are sentence and word count, followed by Flesch Score. Metrics related to backlinks and traffic seem to have a negative effect, which doesn’t mean that AI Chatbots weight them negatively but simply that they don’t matter for mentions or citations.
The top 10% of most cited pages across all 3 LLMs have much less traffic, rank for fewer keywords and get fewer total backlinks. How does that make sense? It almost looks like being strong in traditional SEO metrics is bad for AI Chatbot visibility.
Copilot (not included in the chart) has the starkest inequality, by the way. The top 10% have 17.6 more citations than the bottom 90%. However, top 10% also rank for 1.7x more keywords in organic search. So, Copilot seems to have stronger preferences than other AI Chatbots.

Splitting the data up by AI Chatbot shows you their unique preferences:

💡Takeaway: Content depth (word and sentence count) and readability (Flesch Score) have the biggest impact on citations in AI Chatbots. This is important to understand: Longer content isn’t better because it’s longer, but because it has a higher chance of answering a specific question prompted in an AI Chatbot.

Examples:

www.verywellmind.com/best-online-psychiatrists-5119854 has 187 citations, over 10,000 words and over 1,500 sentences with a Flesch Score of 55 and is cited 72 times by Chat GPT.
On the other hand, www.onlinetherapy.com/best-online-psychiatrists/ has only 3 citations, also a low Flesch Score with 48 but comes “short” with only 3,900 words and 580 sentences.

The key to brand mentions in AI Chatbots: popularity

🔍Context: We don’t yet know the value of a brand of being mentioned by an AI Chatbot. Early research indicates it’s high, especially when prompts indicate purchase intent. However, I wanted to get a step closer by understanding what leads to brand mentions in AI Chatbots in the first place.

After matching many metrics with AI Chatbot visibility, I found one factor that stands out more than anything else: Brand search volume. The number of AI Chatbot mentions and brand search volume have a correlation of .334 - pretty good in this field. In other words, the popularity of a brand broadly decides how visible it is in AI Chatbots.

Popularity is the most significant predictor for Chat GPT, which also sends the most traffic and has the highest usage of all AI Chatbots. When breaking it down by AI Chatbot, I found Chat GPT has the highest correlation with .542 (strong) but Perplexity (.196) and Google AIOs (.254) have lower correlations.

To be clear, there is a lot of nuance on the prompt and category level. But broadly, a brand's visibility seems to be severely impacted by how popular it is.

Example of popular brands and their visibility in the health category

However, when brands are mentioned, all AI Chatbots prefer popular brands and consistently rank them in the same order.

There is a clear link between the categories of the users' questions (mental health, skincare, weight loss, hair loss, erectile dysfunction) and brands.
Early data shows that the most visible brands are digital first and invest heavily in their online presence with content, SEO, reviews, social media and digital advertising.

💡Takeaway: Popularity is the biggest criterion that decides whether a brand is mentioned in AI Chatbots or not. The way consumers connect brands to product categories also matters. Comparing brand search volume and product category presence with your competitors gives you the best idea of how competitive you are on Chat GPT & Co.

Examples: All models in my analysis cite Healthline most often. Not a single other domain was in the top 10 citations for all 4 models, showing their distinctly different taste and how important it is to keep track of many models as opposed to only Chat GPT - if those models also send you traffic.

Other well-cited domains across most models:

verywellmind.com
onlinedoctor.com
medicalnewstoday.com
byrdie.com
cnet.com
ncoa.org

Context: Not all AI Chatbots mentioned brands with the same frequency. Even though Chat GPT has the highest adoption and sends the most referral traffic to sources, Perplexity mentions the most brands per average in answers.

Prompt structure matters for brand visibility:

The word "best" was a strong trigger for brand mentions in 69.71% of prompts
Words like "trusted" (5.77%), "source" (2.88%), "recommend" (0.96%), and "reliable" (0.96%) were also associated with increased likelihood of brand mentions
Prompts including "recommend" often mention public organizations like the FDA, especially when the prompt includes words like “trusted” or “leading”
Google AIOs show the highest brand diversity, followed by Perplexity, then ChatGPT

💡Takeaway: Prompt structure has a meaningful impact on the brands that come up in the answer. However, we’re not yet able to truly know what prompts users utilize. This is important to keep in mind: all prompts we look at and track are just proxies for what users might be doing.

Technical traps can severely sabotage AI visibility

🔍Context: In my research, I encountered several ways brands unintentionally sabotage their AI Chatbot visibility. I surface them here because the pre-requisite to being visible in LLMs is, of course, their ability to crawl your site, whether that’s directly or through training data.

For example, Copilot doesn’t cite onlinedoctor.com because it’s not indexed in Bing. I couldn’t find indicators that this was done on purpose, so I assume it’s an accident that could quickly be fixed and rewarded with referral traffic.

On the other hand, Chat GPT 4o doesn’t cite cnet.com and Perplexity doesn’t cite everydayhealth.com because both sites intentionally block the respective LLM in their robots.txt.

But there are also cases in which AI Chatbots reference sites even though they technically shouldn’t. The most cited domain in Perplexity in my dataset is blocked.goodrx.com. GoodRX blocks users from non-US countries, and it seems it accidentally or intentionally blocks Perplexity.

It’s important to single out Google’s AI Overviews here: There is no opt-out for AIOs, meaning if you want to get organic traffic from Google, you need to allow it to crawl your site, potentially use your content to train its models and surface it in AI Overviews. Chegg recently filed a lawsuit against Google for this.

💡Takeaway: Monitor your site, especially if all wanted URLs are indexed, in Google Search Console and Bing Webmaster Tools. Double-check whether you accidentally block an LLM crawler in your robots.txt or through your CDN. If you intentionally block LLM crawlers, double check whether you appear in their answers, simply by asking them what they know about your domain.

Summary: 6 key learnings

Classic SEO metrics don't strongly influence AI Chatbot citations
Content depth (higher word and sentence counts) and readability (good Flesch Score) matter more
Different AI Chatbots have distinct preferences - monitoring multiple platforms is important
Brand popularity (measured by search volume) is the strongest predictor of brand mentions in AI Chatbots, especially in ChatGPT
Prompt structure influences brand visibility and we don’t yet know how user phrase prompts
Technical issues can sabotage AI visibility - ensure your site isn't accidentally blocking LLM crawlers through robots.txt or CDN settings

Michela Pretto

May 6

It seems like with AI we're going back in a popularity world, ignoring a little bit authoritativeness... Mmm

Expand full comment

Ian Ferguson

Mar 18

Thanks Kevin, love the insights! Are you actively tracking LLM inclusions in some sort of dashboard for clients, or is it too volatile/inaccurate at the moment?

1 reply by Kevin Indig

6 more comments...