Do AI Models Prefer Websites with Schema?
Do AI models prefer websites with schema? We analyzed 17,000 URLs to find out.
Many people are switching to ChatGPT to ask for product recommendations. This shift is pushing organizations to invest in AEO (Answer Engine Optimization) and GEO (Generative Engine Optimization) to get their products and services featured in AI assistants like ChatGPT and Gemini.
To explore whether schema plays a role, we analyzed around 17K websites already cited by AI assistants and examined how often schema appears across different sectors and models.
What is Schema
Schema is used to tell search engines like Google about the content of this page. It can also help your web page appear in Google Search Results.
In more technical terms, it is referred to as structured data markup.
There are three main ways to include structured data into webpages: JSON-LD, Microdata, and RDFa. In this article, we will focus on the JSON-LD format, which is recommended by Google.
Here is an example of what the schema looks like for buzzsense.ai:
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "SoftwareApplication",
"name": "BuzzSense",
"applicationCategory": "BusinessApplication",
"operatingSystem": "Web",
"url": "https://www.buzzsense.ai/",
"description": "BuzzSense is a platform that tracks brand visibility..."
}
</script>
Adding schema is one of the techniques used in Search Engine Optimization (SEO). It is added to the blog posts and other pages on the website. But it might vary depending on the page's content. We will discuss the schema types as suggested by Google documentation on structured data next.
Schema Types
Different schema types are presented by Google differently. For example, Local businesses are shown as Google map entries, while recipes are shown as cards with an image, rating, and descriptions. Here we show the most relevant types, as shown in our experiment.
- Article: a generic article (covers blog posts, news articles, and more)
- BlogPosting: a blog post.
- NewsArticle: a news article.
- Organization: represents an organization and its details.
- WebPage: a single page.
- WebSite: It refers to the whole website, which contains one or more web pages.
- Person: a human along with relevant attributes, like name.
- BreadcrumbList: a hierarchy of pages on a website, like travel > Europe > Southern Europe > Italy. It is also viewed as an ordered list of elements.
Why use Schema
Schema is used by Google to understand the content of your page and, hence, present it accordingly. In other words, it improves how people view the content on your web page.
If you like to learn how to properly add schema to your website and validate from Google's rich result report overview
Does Schema Matter to Show in AI Search Results?
The short answer is YES. But to know how much does it actually matters, we performed an experiment with buzzsense.ai and analyzed all the websites used in AI Search Results.
Experiment: Analyzing the schema of cited websites in AI Search
We perform the experiment on ChatGPT and Gemini in the following sectors: beauty, dentistry, and software. We first analyse the aggregated results. Next, we break down of the results according to language, model, and sector.
This experiment attempts to answer the following questions:
- How many cited websites in AI Search have schema?
- What is the most frequent schema type?
- Is there a difference between the sectors in the percentage of cited webpages that have schema?
- Do models differ in the percentage of schema in the web pages they cite?
Question 1: How many cited websites in AI Search have schema?
We examined around 17K URLs cited by ChatGPT and Gemini, and for the valid URLs, we examined which web pages have schema present. Since our experiment used Arabic and English queries, we report results by language.
As shown in the graph, the majority of the cited webpages in ChatGPT and Gemini have a schema present. Also, not surprisingly, the websites cited in English queries are more likely to have schema than those cited for Arabic queries.
Question 2: What is the most frequent schema type?
We group all web pages that have at least one schema type and count the occurrences for each type. In case several types are present in a single web page, we count all of them. To compute the percentages, we divide the number of occurrences by the total number of web pages with a schema.
As we found several schema types mentioned only a few times, we ignored any type that appeared fewer than 100 times.
The schema type frequencies in websites cited for Arabic prompts are very similar to those of their English counterparts.
Organization and Person are the most common schema types, followed by WebPage, BreadCrumbList, and WebSite. These are generic types.
What we found interesting is what follows. The Article type shows 32% and BlogPosting 18%, while the NewsArticles occupy only 8%.
Another interesting claim is that adding a FAQPage type is what makes a Web Page get cited by AI assistance, but FAQPage schema type shows only 16%.
Question 3: Is there a difference between the sectors in the percentage of cited webpages that have schema?
The top tier, which has the highest percentage of schema in cited web pages, has the sectors: Loyalty Program (81%) and Banks (79%).
Surprisingly, the Software and Public Relations sectors are not the highest in schema percentage among cited websites in AI search results. But yet, they are in the top-mid range.
The schema appears least often on cited websites in AI search results for the beauty sector (Nail Spa and Plastic Surgery), Telecom, and Payment Service Providers, ranging from 45% to 52%.
So, to answer the question, yes, there is a difference between sectors, ranging from 81% to 45%.
Question 4: Do models differ in the percentage of schema in the web pages they cite?
We performed our experiments with ChatGPT and Gemini, specifically, the following models:
- gemini-2.5-flash
- gemini-2.5-flash-lite
- gpt-5.2
- gpt-5-nano
- gpt-4o-mini
Gemini models have the highest percentage of schema presence among cited websites compared to GPT models. More specifically, both gemini-2.5-flash and gemini-2.5-flash-lite have more websites with schemas than gpt-5.2, gpt-5-nano, and gpt-4o-mini.
The GPT model with the highest schema presence in our test is gpt-5.2, which is 4% lower than gemini-2.5-flash and 5% lower than gemini-2.5-flash-lite. With this, we can notice that Gemini models cite websites with schema more than GPT models (from the tested models).
Summary of Schema Presence
When prompting AI assistants like ChatGPT and Gemini, they cite relevant web pages to answer user queries. Many argue that the presence of schema in web pages increases the likelihood of it being cited in AI search answers. We performed an experiment to see if a schema is present in the majority of the cited web pages. More concretely, we try to answer the following questions:
-
How many cited websites in AI Search have schema?
It is between 60% (Arabic) and 70% (English).
-
What is the most frequent schema type?
The generic types were the most common: Organization, Person, Webpage, and Website. For more specific ones, Article classes were among the most used schema types, which are present in the following frequencies:
- Article: 32%
- BlogPosting: 18%
- NewsArticle: 8%
Other cited types include Question (16%), Answer (16%), and Offer (10%).
-
Is there a difference between the sectors in the percentage of cited webpages that have schema?
Yes. The sector with the highest schema presence in cited web pages was the Loyalty Program sector (81%), while the sectors with the lowest were the Nail Spa and Payment Service Provider sectors, with schema present in only 45% of the cited web pages in AI search answers.
-
Do models differ in the percentage of schema in the web pages they cite?
There is a difference between the models in terms of the presence of schema on cited web pages. The models gemini-2.5-flash and gemini-2.5-flash-lite had the highest schema presence, at 70% and 71%, respectively. GPT models followed them with 66% for gpt-5.2, 64% for gpt-5-nano, and 32% for gpt-4o-mini.
Key Findings and Takeaways
Key findings:
- Most AI-cited websites use schema.
- Generic schema types dominate websites cited by AI assistants.
- Some industries use schema more than others.
- Gemini models cite websites with schemas more than GPT models.
Takeaways: Schema seems to be an important factor in getting the website cited by AI, but it is more likely part of a broader GEO strategy.
Limitations and Next Steps
Our analysis is limited to webpages that are already cited by AI assistants. As a result, we do not have a control group of non-cited pages, which makes it difficult to determine whether schema markup directly increases the likelihood of being cited.
In other words, while we observe that many cited pages include schema, we cannot quantify the causal impact of adding schema to a webpage.
This work serves as a starting point for future research in this area. For example, organizations with consistent schema implementation across their websites could compare their visibility against benchmarks like ours to better understand potential effects. More controlled experiments, such as comparing similar pages with and without schema, would help isolate its true impact.