Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models
- URL: http://arxiv.org/abs/2507.07484v1
- Date: Thu, 10 Jul 2025 07:11:57 GMT
- Title: Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models
- Authors: Kaiqu Liang, Haimin Hu, Xuandong Zhao, Dawn Song, Thomas L. Griffiths, Jaime Fernández Fisac,
- Abstract summary: Bullshit, as conceptualized by philosopher Harry Frankfurt, refers to statements made without regard to their truth value.<n>We introduce the Bullshit Index, a novel metric quantifying large language model's indifference to truth.<n>We observe prevalent machine bullshit in political contexts, with weasel words as the dominant strategy.
- Score: 57.834711966432685
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Bullshit, as conceptualized by philosopher Harry Frankfurt, refers to statements made without regard to their truth value. While previous work has explored large language model (LLM) hallucination and sycophancy, we propose machine bullshit as an overarching conceptual framework that can allow researchers to characterize the broader phenomenon of emergent loss of truthfulness in LLMs and shed light on its underlying mechanisms. We introduce the Bullshit Index, a novel metric quantifying LLMs' indifference to truth, and propose a complementary taxonomy analyzing four qualitative forms of bullshit: empty rhetoric, paltering, weasel words, and unverified claims. We conduct empirical evaluations on the Marketplace dataset, the Political Neutrality dataset, and our new BullshitEval benchmark (2,400 scenarios spanning 100 AI assistants) explicitly designed to evaluate machine bullshit. Our results demonstrate that model fine-tuning with reinforcement learning from human feedback (RLHF) significantly exacerbates bullshit and inference-time chain-of-thought (CoT) prompting notably amplify specific bullshit forms, particularly empty rhetoric and paltering. We also observe prevalent machine bullshit in political contexts, with weasel words as the dominant strategy. Our findings highlight systematic challenges in AI alignment and provide new insights toward more truthful LLM behavior.
Related papers
- Incentivizing Truthful Language Models via Peer Elicitation Games [10.530016288072506]
Large Language Models (LLMs) have demonstrated strong generative capabilities but remain prone to inconsistencies and hallucinations.<n>We introduce Peer Elicitation Games (PEG), a training-free, game-theoretic framework for aligning LLMs through a peer elicitation mechanism involving a generator and multiple discriminators instantiated from distinct base models.
arXiv Detail & Related papers (2025-05-19T18:16:58Z) - A Debate-Driven Experiment on LLM Hallucinations and Accuracy [7.821303946741665]
This study investigates the phenomenon of hallucination in large language models (LLMs)
Multiple instances of GPT-4o-Mini models engage in a debate-like interaction prompted with questions from the TruthfulQA dataset.
One model is deliberately instructed to generate plausible but false answers while the other models are asked to respond truthfully.
arXiv Detail & Related papers (2024-10-25T11:41:27Z) - Chaos with Keywords: Exposing Large Language Models Sycophantic Hallucination to Misleading Keywords and Evaluating Defense Strategies [47.92996085976817]
This study explores the sycophantic tendencies of Large Language Models (LLMs)
LLMs tend to provide answers that match what users want to hear, even if they are not entirely correct.
arXiv Detail & Related papers (2024-06-06T08:03:05Z) - Missci: Reconstructing Fallacies in Misrepresented Science [84.32990746227385]
Health-related misinformation on social networks can lead to poor decision-making and real-world dangers.
Missci is a novel argumentation theoretical model for fallacious reasoning.
We present Missci as a dataset to test the critical reasoning abilities of large language models.
arXiv Detail & Related papers (2024-06-05T12:11:10Z) - A Theoretical Understanding of Self-Correction through In-context Alignment [51.622068973630796]
Large language models (LLMs) are capable of improving their abilities purely by self-correction.
We show that when LLMs give relatively accurate self-examinations as rewards, they are capable of refining responses in an in-context way.
Inspired by these findings, we also illustrate applications of self-correction, such as defending against LLM jailbreaks.
arXiv Detail & Related papers (2024-05-28T22:33:02Z) - LLMs' Reading Comprehension Is Affected by Parametric Knowledge and Struggles with Hypothetical Statements [59.71218039095155]
Task of reading comprehension (RC) provides a primary means to assess language models' natural language understanding (NLU) capabilities.<n>If the context aligns with the models' internal knowledge, it is hard to discern whether the models' answers stem from context comprehension or from internal information.<n>To address this issue, we suggest to use RC on imaginary data, based on fictitious facts and entities.
arXiv Detail & Related papers (2024-04-09T13:08:56Z) - Will the Real Linda Please Stand up...to Large Language Models? Examining the Representativeness Heuristic in LLMs [7.100094213474042]
Large language models (LLMs) have demonstrated remarkable proficiency in modeling text and generating human-like text.
LLMs may be susceptible to a common cognitive trap in human decision-making called the representativeness.
This research investigates the impact of the representativeness on LLM reasoning.
arXiv Detail & Related papers (2024-04-01T20:15:06Z) - Hallucinations or Attention Misdirection? The Path to Strategic Value
Extraction in Business Using Large Language Models [0.0]
This paper defines attention misdirection rather than true hallucinations.
This paper highlights the best practices of the PGI, Persona, Grouping, and Intelligence, method.
arXiv Detail & Related papers (2024-02-21T18:40:24Z) - On the Self-Verification Limitations of Large Language Models on Reasoning and Planning Tasks [17.329365493094542]
We present a principled empirical study of the performance of GPT-4 in three domains: Game of 24, Graph Coloring, and STRIPS planning.
We observe significant performance collapse with self-critique and significant performance gains with sound external verification.
arXiv Detail & Related papers (2024-02-12T23:11:01Z) - A Glitch in the Matrix? Locating and Detecting Language Model Grounding with Fakepedia [57.31074448586854]
Large language models (LLMs) have an impressive ability to draw on novel information supplied in their context.
Yet the mechanisms underlying this contextual grounding remain unknown.
We present a novel method to study grounding abilities using Fakepedia.
arXiv Detail & Related papers (2023-12-04T17:35:42Z) - ReEval: Automatic Hallucination Evaluation for Retrieval-Augmented Large Language Models via Transferable Adversarial Attacks [91.55895047448249]
This paper presents ReEval, an LLM-based framework using prompt chaining to perturb the original evidence for generating new test cases.
We implement ReEval using ChatGPT and evaluate the resulting variants of two popular open-domain QA datasets.
Our generated data is human-readable and useful to trigger hallucination in large language models.
arXiv Detail & Related papers (2023-10-19T06:37:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.