Related papers: Exploring the Impact of Instruction-Tuning on LLM's Susceptibility to Misinformation

Exploring the Impact of Instruction-Tuning on LLM's Susceptibility to Misinformation

URL: http://arxiv.org/abs/2507.18203v1
Date: Thu, 24 Jul 2025 08:58:47 GMT
Title: Exploring the Impact of Instruction-Tuning on LLM's Susceptibility to Misinformation
Authors: Kyubeen Han, Junseo Jang, Hongjin Kim, Geunyeong Jeong, Harksoo Kim,
Abstract summary: We investigate the impact of instruction-tuning on large language models' susceptibility to misinformation.<n>Our analysis reveals that instruction-tuned LLMs are significantly more likely to accept misinformation when it is presented by the user.
Score: 3.032542495872679
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Instruction-tuning enhances the ability of large language models (LLMs) to follow user instructions more accurately, improving usability while reducing harmful outputs. However, this process may increase the model's dependence on user input, potentially leading to the unfiltered acceptance of misinformation and the generation of hallucinations. Existing studies primarily highlight that LLMs are receptive to external information that contradict their parametric knowledge, but little research has been conducted on the direct impact of instruction-tuning on this phenomenon. In our study, we investigate the impact of instruction-tuning on LLM's susceptibility to misinformation. Our analysis reveals that instruction-tuned LLMs are significantly more likely to accept misinformation when it is presented by the user. A comparison with base models shows that instruction-tuning increases reliance on user-provided information, shifting susceptibility from the assistant role to the user role. Furthermore, we explore additional factors influencing misinformation susceptibility, such as the role of the user in prompt structure, misinformation length, and the presence of warnings in the system prompt. Our findings underscore the need for systematic approaches to mitigate unintended consequences of instruction-tuning and enhance the reliability of LLMs in real-world applications.

Related papers

Investigating the Effects of Cognitive Biases in Prompts on Large Language Model Outputs [3.7302076138352205]
This paper investigates the influence of cognitive biases on Large Language Models (LLMs) outputs.<n> cognitive biases, such as confirmation and availability biases, can distort user inputs through prompts.
arXiv Detail & Related papers (2025-06-14T04:18:34Z)
From Parameters to Prompts: Understanding and Mitigating the Factuality Gap between Fine-Tuned LLMs [4.447729258258283]
We study the factuality gap that arises when fine-tuning on known versus unknown knowledge.<n>Our results shed light on the interaction between finetuning data and test-time prompt.
arXiv Detail & Related papers (2025-05-29T12:59:30Z)
UIPE: Enhancing LLM Unlearning by Removing Knowledge Related to Forgetting Targets [41.0340052199534]
Large Language Models (LLMs) inevitably acquire harmful information during training on massive datasets.<n>Existing unlearning methods focus on forgetting target data while overlooking the crucial impact of logically related knowledge on the effectiveness of unlearning.<n>We propose Unlearning Improvement via Extrapolation (UIPE), a method that removes knowledge highly correlated with the forgetting targets.
arXiv Detail & Related papers (2025-03-06T18:40:00Z)
Aligning Large Language Models to Follow Instructions and Hallucinate Less via Effective Data Filtering [66.5524727179286]
NOVA is a framework designed to identify high-quality data that aligns well with the learned knowledge to reduce hallucinations.<n>It includes Internal Consistency Probing (ICP) and Semantic Equivalence Identification (SEI) to measure how familiar the LLM is with instruction data.<n>To ensure the quality of selected samples, we introduce an expert-aligned reward model, considering characteristics beyond just familiarity.
arXiv Detail & Related papers (2025-02-11T08:05:56Z)
Exploring Knowledge Boundaries in Large Language Models for Retrieval Judgment [56.87031484108484]
Large Language Models (LLMs) are increasingly recognized for their practical applications. Retrieval-Augmented Generation (RAG) tackles this challenge and has shown a significant impact on LLMs. By minimizing retrieval requests that yield neutral or harmful results, we can effectively reduce both time and computational costs.
arXiv Detail & Related papers (2024-11-09T15:12:28Z)
On the Loss of Context-awareness in General Instruction Fine-tuning [101.03941308894191]
We investigate the loss of context awareness after supervised fine-tuning.<n>We find that the performance decline is associated with a bias toward different roles learned during conversational instruction fine-tuning.<n>We propose a metric to identify context-dependent examples from general instruction fine-tuning datasets.
arXiv Detail & Related papers (2024-11-05T00:16:01Z)
How Susceptible are LLMs to Influence in Prompts? [6.644673474240519]
Large Language Models (LLMs) are highly sensitive to prompts, including additional context provided therein. We study how an LLM's response to multiple-choice questions changes when the prompt includes a prediction and explanation from another model. Our findings reveal that models are strongly influenced, and when explanations are provided they are swayed irrespective of the quality of the explanation.
arXiv Detail & Related papers (2024-08-17T17:40:52Z)
LLM In-Context Recall is Prompt Dependent [0.0]
A model's ability to do this significantly influences its practical efficacy and dependability in real-world applications. This study demonstrates that an LLM's recall capability is not only contingent upon the prompt's content but also may be compromised by biases in its training data.
arXiv Detail & Related papers (2024-04-13T01:13:59Z)
A Closer Look at the Limitations of Instruction Tuning [52.587607091917214]
We show that Instruction Tuning (IT) fails to enhance knowledge or skills in large language models (LLMs) We also show that popular methods to improve IT do not lead to performance improvements over a simple LoRA fine-tuned model. Our findings reveal that responses generated solely from pre-trained knowledge consistently outperform responses by models that learn any form of new knowledge from IT on open-source datasets.
arXiv Detail & Related papers (2024-02-03T04:45:25Z)
An Empirical Study of Catastrophic Forgetting in Large Language Models During Continual Fine-tuning [70.48605869773814]
Catastrophic forgetting (CF) is a phenomenon that occurs in machine learning when a model forgets previously learned information.<n>This study empirically evaluates the forgetting phenomenon in large language models during continual instruction tuning.
arXiv Detail & Related papers (2023-08-17T02:53:23Z)
On the Risk of Misinformation Pollution with Large Language Models [127.1107824751703]
We investigate the potential misuse of modern Large Language Models (LLMs) for generating credible-sounding misinformation. Our study reveals that LLMs can act as effective misinformation generators, leading to a significant degradation in the performance of Open-Domain Question Answering (ODQA) systems.
arXiv Detail & Related papers (2023-05-23T04:10:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.