Whose Facts Win? LLM Source Preferences under Knowledge Conflicts
- URL: http://arxiv.org/abs/2601.03746v2
- Date: Tue, 13 Jan 2026 09:48:40 GMT
- Title: Whose Facts Win? LLM Source Preferences under Knowledge Conflicts
- Authors: Jakob Schuster, Vagrant Gautam, Katja Markert,
- Abstract summary: We investigate how source preferences affect large language models (LLMs) resolution of inter-context knowledge conflicts in English.<n>We find that LLMs prefer institutionally-corroborated information over information from people and social media.<n>These source preferences can be reversed by simply repeating information from less credible sources.
- Score: 4.587118047944915
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: As large language models (LLMs) are more frequently used in retrieval-augmented generation pipelines, it is increasingly relevant to study their behavior under knowledge conflicts. Thus far, the role of the source of the retrieved information has gone unexamined. We address this gap with a novel framework to investigate how source preferences affect LLM resolution of inter-context knowledge conflicts in English, motivated by interdisciplinary research on credibility. With a comprehensive, tightly-controlled evaluation of 13 open-weight LLMs, we find that LLMs prefer institutionally-corroborated information (e.g., government or newspaper sources) over information from people and social media. However, these source preferences can be reversed by simply repeating information from less credible sources. To mitigate repetition effects and maintain consistent preferences, we propose a novel method that reduces repetition bias by up to 99.8%, while also maintaining at least 88.8% of original preferences. We release all data and code to encourage future work on credibility and source preferences in knowledge-intensive NLP.
Related papers
- In Agents We Trust, but Who Do Agents Trust? Latent Source Preferences Steer LLM Generations [19.98336514529218]
Large Language Models (LLMs) are increasingly being deployed as interfaces to information on online platforms.<n>LLMs govern the information users receive, by drawing users' attention to particular instances of retrieved information at the expense of others.<n>We find that several models consistently exhibit strong and predictable source preferences.
arXiv Detail & Related papers (2026-02-17T09:45:22Z) - How Do LLM-Generated Texts Impact Term-Based Retrieval Models? [76.92519309816008]
This paper investigates the influence of large language models (LLMs) on term-based retrieval models.<n>Our linguistic analysis reveals that LLM-generated texts exhibit smoother high-frequency and steeper low-frequency Zipf slopes.<n>Our study further explores whether term-based retrieval models demonstrate source bias, concluding that these models prioritize documents whose term distributions closely correspond to those of the queries.
arXiv Detail & Related papers (2025-08-25T06:43:27Z) - Positional Biases Shift as Inputs Approach Context Window Limits [57.00239097102958]
The LiM effect is strongest when inputs occupy up to 50% of a model's context window.<n>We observe a distance-based bias, where model performance is better when relevant information is closer to the end of the input.
arXiv Detail & Related papers (2025-08-10T20:40:24Z) - How does Misinformation Affect Large Language Model Behaviors and Preferences? [37.06385727015972]
Large Language Models (LLMs) have shown remarkable capabilities in knowledge-intensive tasks.<n>We present MisBench, the current largest and most comprehensive benchmark for evaluating LLMs' behavior and knowledge preference toward misinformation.<n> Empirical results reveal that while LLMs demonstrate comparable abilities in discerning misinformation, they still remain susceptible to knowledge conflicts and stylistic variations.
arXiv Detail & Related papers (2025-05-27T17:57:44Z) - Accommodate Knowledge Conflicts in Retrieval-augmented LLMs: Towards Reliable Response Generation in the Wild [11.058848731627233]
Large language models (LLMs) have advanced information retrieval systems.<n>LLMs often face knowledge conflicts between internal memory and retrievaled external information.<n>We propose Swin-VIB, a novel framework that integrates a pipeline of variational information bottleneck models into adaptive augmentation of retrieved information.
arXiv Detail & Related papers (2025-04-17T14:40:31Z) - Fact-checking AI-generated news reports: Can LLMs catch their own lies? [4.232709762282742]
We evaluate whether Large Language Models (LLMs) can effectively fact-check their own content.<n>LLMs are more effective at assessing claims in national or international news stories than in local news stories.<n>We find that incorporating retrieved results from a search engine in a Retrieval-Augmented Generation setting significantly reduces the number of claims an LLM cannot assess.
arXiv Detail & Related papers (2025-03-24T02:32:02Z) - How LLMs Fail to Support Fact-Checking [4.918358353535447]
Large Language Models (LLMs) can amplify online misinformation, but show promise in tackling misinformation.<n>We empirically study the capabilities of three LLMs -- ChatGPT, Gemini, and Claude -- in countering political misinformation.<n>Our findings suggest that models struggle to ground their responses in real news sources, and tend to prefer citing left-leaning sources.
arXiv Detail & Related papers (2025-02-28T07:12:03Z) - Distance between Relevant Information Pieces Causes Bias in Long-Context LLMs [50.40165119718928]
LongPiBench is a benchmark designed to assess positional bias involving multiple pieces of relevant information.<n>These experiments reveal that while most current models are robust against the "lost in the middle" issue, there exist significant biases related to the spacing of relevant information pieces.
arXiv Detail & Related papers (2024-10-18T17:41:19Z) - Cognitive Biases in Large Language Models for News Recommendation [68.90354828533535]
This paper explores the potential impact of cognitive biases on large language models (LLMs) based news recommender systems.
We discuss strategies to mitigate these biases through data augmentation, prompt engineering and learning algorithms aspects.
arXiv Detail & Related papers (2024-10-03T18:42:07Z) - Source-Aware Training Enables Knowledge Attribution in Language Models [81.13048060332775]
Intrinsic source citation can enhance transparency, interpretability, and verifiability.
Our training recipe can enable faithful attribution to the pretraining data without a substantial impact on the model's perplexity.
arXiv Detail & Related papers (2024-04-01T09:39:38Z) - When Do LLMs Need Retrieval Augmentation? Mitigating LLMs' Overconfidence Helps Retrieval Augmentation [66.01754585188739]
Large Language Models (LLMs) have been found to have difficulty knowing they do not possess certain knowledge.
Retrieval Augmentation (RA) has been extensively studied to mitigate LLMs' hallucinations.
We propose several methods to enhance LLMs' perception of knowledge boundaries and show that they are effective in reducing overconfidence.
arXiv Detail & Related papers (2024-02-18T04:57:19Z) - Exploring the Jungle of Bias: Political Bias Attribution in Language Models via Dependency Analysis [86.49858739347412]
Large Language Models (LLMs) have sparked intense debate regarding the prevalence of bias in these models and its mitigation.
We propose a prompt-based method for the extraction of confounding and mediating attributes which contribute to the decision process.
We find that the observed disparate treatment can at least in part be attributed to confounding and mitigating attributes and model misalignment.
arXiv Detail & Related papers (2023-11-15T00:02:25Z) - Know Where to Go: Make LLM a Relevant, Responsible, and Trustworthy
Searcher [10.053004550486214]
Large Language Models (LLMs) have shown the potential to improve relevance and provide direct answers in web searches.
challenges arise in the reliability of generated results and the credibility of contributing sources.
We propose a novel generative retrieval framework leveraging the knowledge of LLMs to foster a direct link between queries and online sources.
arXiv Detail & Related papers (2023-10-19T03:49:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.