Related papers: Thinking in a Crowd: How Auxiliary Information Shapes LLM Reasoning

Thinking in a Crowd: How Auxiliary Information Shapes LLM Reasoning

URL: http://arxiv.org/abs/2509.18163v1
Date: Wed, 17 Sep 2025 06:45:21 GMT
Title: Thinking in a Crowd: How Auxiliary Information Shapes LLM Reasoning
Authors: Haodong Zhao, Chenyan Zhao, Yansi Li, Zhuosheng Zhang, Gongshen Liu,
Abstract summary: This paper investigates the impact of auxiliary information on the reasoning process of Large Language Models (LLMs)<n>We introduce SciAux, a new dataset derived from ScienceQA, to systematically test the robustness of the model against these types of information.<n>Our findings reveal a critical vulnerability: the model's deliberative "thinking mode" is a double-edged sword.
Score: 22.49618553262681
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The capacity of Large Language Models (LLMs) to reason is fundamental to their application in complex, knowledge-intensive domains. In real-world scenarios, LLMs are often augmented with external information that can be helpful, irrelevant, or even misleading. This paper investigates the causal impact of such auxiliary information on the reasoning process of LLMs with explicit step-by-step thinking capabilities. We introduce SciAux, a new dataset derived from ScienceQA, to systematically test the robustness of the model against these types of information. Our findings reveal a critical vulnerability: the model's deliberative "thinking mode" is a double-edged sword. While helpful context improves accuracy, misleading information causes a catastrophic drop in performance, which is amplified by the thinking process. Instead of conferring robustness, thinking reinforces the degree of error when provided with misinformation. This highlights that the challenge is not merely to make models "think", but to endow them with the critical faculty to evaluate the information upon which their reasoning is based. The SciAux dataset is available at https://huggingface.co/datasets/billhdzhao/SciAux.

Related papers

A Guide to Large Language Models in Modeling and Simulation: From Core Techniques to Critical Challenges [0.0]
We aim to provide comprehensive and practical guidance on how to use large language models (LLMs)<n>We discuss common sources of confusion, including non-determinism, knowledge augmentation, and decomposition of M&S data.<n>We emphasize principled design choices, diagnostic strategies, and empirical evaluation.
arXiv Detail & Related papers (2026-02-05T17:00:07Z)
Hey, That's My Data! Label-Only Dataset Inference in Large Language Models [63.35066172530291]
CatShift is a label-only dataset-inference framework.<n>It capitalizes on catastrophic forgetting: the tendency of an LLM to overwrite previously learned knowledge when exposed to new data.
arXiv Detail & Related papers (2025-06-06T13:02:59Z)
Generating Grounded Responses to Counter Misinformation via Learning Efficient Fine-Grained Critiques [9.514892000592912]
MisMitiFact is an efficient framework for generating fact-grounded counter-responses at scale.<n>We develop lightweight, fine-grained critique models trained on data sourced from readily available fact-checking sites.<n>It achieves 5x increase in feedback generation throughput, making it highly suitable for cost-effective, large-scale misinformation mitigation.
arXiv Detail & Related papers (2025-06-06T09:46:09Z)
Unraveling Misinformation Propagation in LLM Reasoning [22.21135267544835]
We show how misinformation propagates within Large Language Models' reasoning process.<n>Applying factual corrections early in the reasoning process most effectively reduces misinformation propagation.<n>Our work offers a practical approach to mitigating misinformation propagation.
arXiv Detail & Related papers (2025-05-24T06:45:45Z)
Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision [120.40788744292739]
We propose a two-player paradigm that separates the roles of reasoning and critique models. We first propose AutoMathCritique, an automated and scalable framework for collecting critique data. We demonstrate that the critique models consistently improve the actor's performance on difficult queries at test-time.
arXiv Detail & Related papers (2024-11-25T17:11:54Z)
Understanding Knowledge Drift in LLMs through Misinformation [11.605377799885238]
Large Language Models (LLMs) have revolutionized numerous applications, making them an integral part of our digital ecosystem. We analyze the susceptibility of state-of-the-art LLMs to factual inaccuracies when they encounter false information in a QnA scenario. Our experiments reveal that an LLM's uncertainty can increase up to 56.6% when the question is answered incorrectly.
arXiv Detail & Related papers (2024-09-11T08:11:16Z)
Understanding the Relationship between Prompts and Response Uncertainty in Large Language Models [55.332004960574004]
Large language models (LLMs) are widely used in decision-making, but their reliability, especially in critical tasks like healthcare, is not well-established.<n>This paper investigates how the uncertainty of responses generated by LLMs relates to the information provided in the input prompt.<n>We propose a prompt-response concept model that explains how LLMs generate responses and helps understand the relationship between prompts and response uncertainty.
arXiv Detail & Related papers (2024-07-20T11:19:58Z)
LLMs' Reading Comprehension Is Affected by Parametric Knowledge and Struggles with Hypothetical Statements [59.71218039095155]
Task of reading comprehension (RC) provides a primary means to assess language models' natural language understanding (NLU) capabilities.<n>If the context aligns with the models' internal knowledge, it is hard to discern whether the models' answers stem from context comprehension or from internal information.<n>To address this issue, we suggest to use RC on imaginary data, based on fictitious facts and entities.
arXiv Detail & Related papers (2024-04-09T13:08:56Z)
The Frontier of Data Erasure: Machine Unlearning for Large Language Models [56.26002631481726]
Large Language Models (LLMs) are foundational to AI advancements. LLMs pose risks by potentially memorizing and disseminating sensitive, biased, or copyrighted information. Machine unlearning emerges as a cutting-edge solution to mitigate these concerns.
arXiv Detail & Related papers (2024-03-23T09:26:15Z)
R-Tuning: Instructing Large Language Models to Say `I Don't Know' [66.11375475253007]
Large language models (LLMs) have revolutionized numerous domains with their impressive performance but still face their challenges. Previous instruction tuning methods force the model to complete a sentence no matter whether the model knows the knowledge or not. We present a new approach called Refusal-Aware Instruction Tuning (R-Tuning) Experimental results demonstrate R-Tuning effectively improves a model's ability to answer known questions and refrain from answering unknown questions.
arXiv Detail & Related papers (2023-11-16T08:45:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.