Related papers: News Source Citing Patterns in AI Search Systems

News Source Citing Patterns in AI Search Systems

URL: http://arxiv.org/abs/2507.05301v1
Date: Mon, 07 Jul 2025 02:17:57 GMT
Title: News Source Citing Patterns in AI Search Systems
Authors: Kai-Cheng Yang,
Abstract summary: We analyze data from the AI Search Arena, a head-to-head evaluation platform for AI search systems.<n>The dataset comprises over 24,000 conversations and 65,000 responses from models across three major providers: OpenAI, Perplexity, and Google.<n>We find that while models from different providers cite distinct news sources, they exhibit shared patterns in citation behavior.
Score: 6.976269683687743
License: http://creativecommons.org/licenses/by/4.0/
Abstract: AI-powered search systems are emerging as new information gatekeepers, fundamentally transforming how users access news and information. Despite their growing influence, the citation patterns of these systems remain poorly understood. We address this gap by analyzing data from the AI Search Arena, a head-to-head evaluation platform for AI search systems. The dataset comprises over 24,000 conversations and 65,000 responses from models across three major providers: OpenAI, Perplexity, and Google. Among the over 366,000 citations embedded in these responses, 9% reference news sources. We find that while models from different providers cite distinct news sources, they exhibit shared patterns in citation behavior. News citations concentrate heavily among a small number of outlets and display a pronounced liberal bias, though low-credibility sources are rarely cited. User preference analysis reveals that neither the political leaning nor the quality of cited news sources significantly influences user satisfaction. These findings reveal significant challenges in current AI search systems and have important implications for their design and governance.

Related papers

Search Arena: Analyzing Search-Augmented LLMs [61.28673331156436]
We introduce Search Arena, a crowd-sourced, large-scale, human-preference dataset of over 24,000 paired multi-turn user interactions.<n>The dataset spans diverse intents and languages, and contains full system traces with around 12,000 human preference votes.<n>Our analysis reveals that user preferences are influenced by the number of citations, even when the cited content does not directly support the attributed claims.
arXiv Detail & Related papers (2025-06-05T17:59:26Z)
Impact of Fake News on Social Media Towards Public Users of Different Age Groups [0.0]
This study examines how fake news affects social media users across a range of age groups. The paper evaluates various machine learning models for their efficacy in identifying and categorizing fake news.
arXiv Detail & Related papers (2024-11-08T15:32:20Z)
Fairness and Bias in Multimodal AI: A Survey [0.20971479389679337]
The importance of addressing fairness and bias in artificial intelligence (AI) systems cannot be over-emphasized. We fill a gap with regards to the relatively minimal study of fairness and bias in Large Multimodal Models (LMMs) compared to Large Language Models (LLMs) We provide 50 examples of datasets and models related to both types of AI along with the challenges of bias affecting them.
arXiv Detail & Related papers (2024-06-27T11:26:17Z)
Generative AI Search Engines as Arbiters of Public Knowledge: An Audit of Bias and Authority [2.860575804107195]
This paper reports on an audit study of generative AI systems (ChatGPT, Bing Chat, and Perplexity) which investigates how these new search engines construct responses. We collected system responses using a set of 48 authentic queries for 4 topics over a 7-day period and analyzed the data using sentiment analysis, inductive coding and source classification. Results provide an overview of the nature of system responses across these systems and provide evidence of sentiment bias based on the queries and topics, and commercial and geographic bias in sources.
arXiv Detail & Related papers (2024-05-22T22:09:32Z)
Learning Unbiased News Article Representations: A Knowledge-Infused Approach [0.0]
We propose a knowledge-infused deep learning model that learns unbiased representations of news articles using global and local contexts. We show that the proposed model mitigates algorithmic political bias and outperforms baseline methods to predict the political leaning of news articles with up to 73% accuracy.
arXiv Detail & Related papers (2023-09-12T06:20:34Z)
Evaluating Verifiability in Generative Search Engines [70.59477647085387]
Generative search engines directly generate responses to user queries, along with in-line citations. We conduct human evaluation to audit four popular generative search engines. We find that responses from existing generative search engines are fluent and appear informative, but frequently contain unsupported statements and inaccurate citations.
arXiv Detail & Related papers (2023-04-19T17:56:12Z)
Towards Corpus-Scale Discovery of Selection Biases in News Coverage: Comparing What Sources Say About Entities as a Start [65.28355014154549]
This paper investigates the challenges of building scalable NLP systems for discovering patterns of media selection biases directly from news content in massive-scale news corpora. We show the capabilities of the framework through a case study on NELA-2020, a corpus of 1.8M news articles in English from 519 news sources worldwide.
arXiv Detail & Related papers (2023-04-06T23:36:45Z)
Unveiling the Hidden Agenda: Biases in News Reporting and Consumption [59.55900146668931]
We build a six-year dataset on the Italian vaccine debate and adopt a Bayesian latent space model to identify narrative and selection biases. We found a nonlinear relationship between biases and engagement, with higher engagement for extreme positions. Analysis of news consumption on Twitter reveals common audiences among news outlets with similar ideological positions.
arXiv Detail & Related papers (2023-01-14T18:58:42Z)
Machine Learning Explanations to Prevent Overtrust in Fake News Detection [64.46876057393703]
This research investigates the effects of an Explainable AI assistant embedded in news review platforms for combating the propagation of fake news. We design a news reviewing and sharing interface, create a dataset of news stories, and train four interpretable fake news detection algorithms. For a deeper understanding of Explainable AI systems, we discuss interactions between user engagement, mental model, trust, and performance measures in the process of explaining.
arXiv Detail & Related papers (2020-07-24T05:42:29Z)
Bias in Multimodal AI: Testbed for Fair Automatic Recruitment [73.85525896663371]
We study how current multimodal algorithms based on heterogeneous sources of information are affected by sensitive elements and inner biases in the data. We train automatic recruitment algorithms using a set of multimodal synthetic profiles consciously scored with gender and racial biases. Our methodology and results show how to generate fairer AI-based tools in general, and in particular fairer automated recruitment systems.
arXiv Detail & Related papers (2020-04-15T15:58:05Z)
SirenLess: reveal the intention behind news [31.757138364005087]
We present SirenLess, a visual analytical system for misleading news detection by linguistic features. The system features article explorer, a novel interactive tool that integrates news metadata and linguistic features to reveal semantic structures of news articles. We use SirenLess to analyze 18 news articles from different sources and summarize some helpful patterns for misleading news detection.
arXiv Detail & Related papers (2020-01-08T20:36:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.