AI-Powered Search Engines Favor Less Popular Sources, Study Finds

Key Points

Researchers compared traditional Google links with AI Overviews, Gemini‑2.5‑Flash, and GPT‑4o.
Test queries came from WildChat, AllSides, and top Amazon product searches.
AI search tools frequently cite websites outside Tranco’s top‑1,000 domains.
53 % of AI Overview sources were not in Google’s top‑10 links for the same query.
40 % of those sources were absent from Google’s top‑100 links.
Findings highlight a shift toward less popular sources in AI‑driven search results.
Implications include broader content exposure but also concerns about source authority.

AI-powered search engines rely on “less popular” sources, researchers find

Content image from AI-powered search engines rely on “less popular” sources, researchers find

Background and Motivation

Since the rollout of Google’s AI Overviews, public awareness has grown around the ways generative AI search results can diverge from the conventional list of links produced by traditional search engines. To quantify this divergence, researchers from Ruhr University in Bochum and the Max Planck Institute for Software Systems conducted a systematic study.

Methodology

The team compared traditional Google link results with AI‑generated outputs from three systems: Google’s AI Overviews, Gemini‑2.5‑Flash, and OpenAI’s GPT‑4o (both its web‑search mode and the variant that invokes a separate search tool). Test queries were drawn from several sources, including specific questions submitted to ChatGPT in the WildChat dataset, political topics listed on AllSides, and the most‑searched Amazon products.

Popularity Metrics

To assess source popularity, the researchers used the Tranco domain‑ranking system, which ranks websites based on traffic and other factors. They examined whether cited domains fell within the top 1,000, top 1,000,000, or beyond these thresholds.

Key Findings

The analysis revealed a consistent pattern: AI‑powered search tools tended to cite less popular websites compared with traditional Google results. For example, the median source referenced by Gemini fell outside Tranco’s top 1,000 across all queries. Overall, a majority of sources referenced by the AI Overviews did not appear in Google’s top‑10 link results for the same query, and many were absent from the top‑100 list as well.

Specifically, 53 percent of the sources cited by Google’s AI Overviews were not present in the top‑10 Google links, and 40 percent did not even appear in the top‑100 Google links. Similar trends were observed for Gemini and GPT‑4o, indicating that generative search engines frequently draw from domains that would not surface in a standard organic search.

Implications

These findings suggest that AI‑driven search reshapes the information landscape by exposing users to content from less‑visited sites. While this could broaden perspectives, it also raises questions about the reliability and authority of the sources presented. The research underscores the need for further scrutiny of how generative models select and rank information, especially as AI‑based search becomes more prevalent.

Conclusion

The study provides empirical evidence that AI‑powered search engines diverge from traditional search in the popularity of cited sources. As generative AI continues to integrate into search experiences, understanding these differences will be crucial for users, developers, and policymakers alike.

Source: arstechnica.com