Does schema richness affect AI citation?
Most SEO advice says: "add schema and you'll get better visibility."
But here's the real question:
Does more schema actually increase your chances of being cited by AI (ChatGPT, Gemini, Google AI Overviews)?
We analyzed tens of thousands of web pages and their schema markup to find out.
The answer is not as obvious as you might think.
What is Schema Markup?
Schema Markup, or Schema for short, is a way to describe the content of a Web page. It has a specific format. Actually, several; the one we will be focusing on is JSON-LD, which is the format recommended by Google.
Here are some examples of schema types:
Why Schema Matters for AI Search
One of the main contributions of schema is to improve the user experience by presenting the data differently based on the content (e.g., Recipe vs. News Article vs. Local Business). Companies reported that this increases click-through rates and users spend more time on web pages with schema (sometimes referred to as structured data).
Inherently, it is believed that having a schema also increases the likelihood of the website getting cited by AI. We showed that the majority of cited webpages have schema present. But does the richness of the schema have an effect? We study this further below.
Schema Richness Metrics
Before answering the question of whether schema richness affects whether websites appear in AI search, we need to define what we mean by richness. We define the following richness metrics:
- Title Richness: The number of words for the title or name.
- Description Richness: The number of words in the description or text attributes.
- Semantic Richness. The number of schema types (concepts) present in the scheme.
- Attribute Richness: The number of attributes.
- Overall Richness: The total number of words in the schema, including the formatting types.
Schema Richness: Experiment and Results
We gather all the websites cited by ChatGPT or Gemini in our tests. We report the results for each richness measure.
Title Richness
The attributes we considered are: name, title, and headline. In case multiple of them exist, the longest one
is picked.
The majority of Web pages have long titles, ranging from 6 to 16 words (45%). Still, 14% of titles have 1-5 words. Less than 1% has no schema title (can be computed by subtracting the number of websites with no schema from the ones with no schema title).
Description Richness
The largest number of words in both description and text is considered. Sometimes the description does not exist at
all in the schema, which is about 10% of the time (around 36% of the Web pages have no schema).
The majority of the descriptions have between 21 and 40 words. This isn’t pure chance. It is most probably due to the 160-character limit. In Google results, about 160 characters are shown, with the rest cut off. Which is around 30 words. Google does not impose a limit, nor does schema, so people can have longer descriptions if they want. Note that snippets in Google can come from schema or other meta tags. Now we have more understanding of why most Web pages with schema have long descriptions exceeding 20 words.
Semantic Richness
We consider schemas with more types as semantically richer than those with fewer schema types. 39% of cited Web pages have at least 7 types, with 25% have at least 10 types. Only 9% of cited pages have no more than 3 types.
Around 39% of cited Web pages with schema have at least 10 types.
Attribute Richness
The richer the schema, the better, and some attributes are more crucial than others. Here, we count the number of attributes for each cited Web page if the schema is present.
The majority of cited websites have more than 30 attributes, and 33% have more than 60 attributes in the schema. AI-cited Web pages seem to be rich in terms of the number of attributes present in the schema, with only 15% having <= 30 attributes.
Overall Richness
The overall richness measures the total number of words in the schema. More like a syntactic richness, similar to counting the number of words in an article. It can be a signal, but it is not expected to be strong (remember the keyword-stuffing era).
The schema length varies, and no trends are observed. We purposefully leave it here so others who would like to compare their findings across something would be able to do so. So no conclusion is drawn from this graph. Findings of this study are included in the next section.
Key Findings
- Most AI-cited Web pages have schema.
- Almost all AI-cited Web pages with schema have a title.
- Around 90% of Web pages have a schema description present.
- Schema descriptions in AI-cited pages seem to be related to Google snippet number characters (160).
- Web pages cited by AI seem to have semantic richness, with 39% of Web pages with schema having at least 10 types.
- Most AI-cited websites are rich with many attributes that only 15% have no more than 30 attributes present.
Limitations & Takeaways
Due to the lack of a baseline to compare against or a negative pool of Web pages, we can not establish any causal relationship. Nonetheless, some lessons we can draw are that enriching the schema alone might not be enough to break through the competition.
However, by being strategic, adding the important schema types and attributes can potentially increase the likelihood of the web page getting picked up by some AI assistants (e.g., ChatGPT, Gemini, etc.)
Limitations & Takeaways
This study does not establish causality. We did not compare against a negative pool of uncited pages, so we cannot say that schema richness directly causes AI citation.
However, a few practical insights emerge:
- Schema is present in most AI-cited pages.
- Richer schemas (more types and attributes) are common among cited pages.
- Descriptions tend to align with how search snippets are displayed (160 characters).
The key takeaway:
Schema alone won’t get you cited, but weak schema might exclude you.
If you're optimizing for AI visibility:
- Ensure your schema is complete and consistent
- Include relevant types and attributes (not just the basics)
- Focus on clarity and semantic coverage, not just length
You optimized your schema. But are you visible?
Even if schema plays a role, it doesn't guarantee your content is cited.
The real question is whether AI systems are actually mentioning your brand.
With BuzzSense.ai, you can track your visibility across AI search platforms and see how you compare to competitors.