Related papers: GRAIT: Gradient-Driven Refusal-Aware Instruction Tuning for Effective Hallucination Mitigation

GRAIT: Gradient-Driven Refusal-Aware Instruction Tuning for Effective Hallucination Mitigation

URL: http://arxiv.org/abs/2502.05911v1
Date: Sun, 09 Feb 2025 14:11:30 GMT
Title: GRAIT: Gradient-Driven Refusal-Aware Instruction Tuning for Effective Hallucination Mitigation
Authors: Runchuan Zhu, Zinco Jiang, Jiang Wu, Zhipeng Ma, Jiahe Song, Fengshuo Bai, Dahua Lin, Lijun Wu, Conghui He,
Abstract summary: Refusal-Aware Instruction Tuning (RAIT) aims to enhance Large Language Models (LLMs) by improving their ability to refuse responses to questions beyond their knowledge.<n>Effective RAIT must address two key challenges: firstly, effectively reject unknown questions to minimize hallucinations; secondly, avoid over-refusal to ensure questions that can be correctly answered are not rejected.<n>GraIT employs gradient-driven sample selection to effectively minimize hallucinations and (2) introduces an adaptive weighting mechanism during fine-tuning to reduce the risk of over-refusal.
Score: 62.63014905981601
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Refusal-Aware Instruction Tuning (RAIT) aims to enhance Large Language Models (LLMs) by improving their ability to refuse responses to questions beyond their knowledge, thereby reducing hallucinations and improving reliability. Effective RAIT must address two key challenges: firstly, effectively reject unknown questions to minimize hallucinations; secondly, avoid over-refusal to ensure questions that can be correctly answered are not rejected, thereby maintain the helpfulness of LLM outputs. In this paper, we address the two challenges by deriving insightful observations from the gradient-based perspective, and proposing the Gradient-driven Refusal Aware Instruction Tuning Framework GRAIT: (1) employs gradient-driven sample selection to effectively minimize hallucinations and (2) introduces an adaptive weighting mechanism during fine-tuning to reduce the risk of over-refusal, achieving the balance between accurate refusals and maintaining useful responses. Experimental evaluations on open-ended and multiple-choice question answering tasks demonstrate that GRAIT significantly outperforms existing RAIT methods in the overall performance. The source code and data will be available at https://github.com/opendatalab/GRAIT .

Related papers

Reward-Augmented Data Enhances Direct Preference Alignment of LLMs [63.32585910975191]
We introduce reward-conditioned Large Language Models (LLMs) that learn from the entire spectrum of response quality within the dataset. We show that our approach consistently boosts DPO by a considerable margin. Our method not only maximizes the utility of preference data but also mitigates the issue of unlearning, demonstrating its broad effectiveness beyond mere data expansion.
arXiv Detail & Related papers (2024-10-10T16:01:51Z)
Utilize the Flow before Stepping into the Same River Twice: Certainty Represented Knowledge Flow for Refusal-Aware Instruction Tuning [68.57166425493283]
Refusal-Aware Instruction Tuning (RAIT) enables Large Language Models (LLMs) to refuse to answer unknown questions.<n>This crude approach can cause LLMs to excessively refuse answering questions they could have correctly answered.<n>We introduce Certainty Represented Knowledge Flow for Refusal-Aware Instructions Tuning (CRaFT) to address this issue.
arXiv Detail & Related papers (2024-10-09T14:12:51Z)
Wrong-of-Thought: An Integrated Reasoning Framework with Multi-Perspective Verification and Wrong Information [14.071887353084126]
Chain-of-Thought (CoT) has become a vital technique for enhancing the performance of Large Language Models (LLMs) We propose Wrong-of-Thought (WoT), which includes two core modules. Experiments on 8 popular datasets and 5 LLMs demonstrate that WoT surpasses all previous baselines.
arXiv Detail & Related papers (2024-10-06T12:27:21Z)
PEAR: Position-Embedding-Agnostic Attention Re-weighting Enhances Retrieval-Augmented Generation with Zero Inference Overhead [24.611413814466978]
Large language models (LLMs) enhanced with retrieval-augmented generation (RAG) have introduced a new paradigm for web search. Existing methods to enhance context awareness are often inefficient, incurring time or memory overhead during inference. We propose Position-Embedding-Agnostic attention Re-weighting (PEAR) which enhances the context awareness of LLMs with zero inference overhead.
arXiv Detail & Related papers (2024-09-29T15:40:54Z)
Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models [36.119299938503936]
Large vision-language models (LVLMs) have shown promising performance on a variety of vision-language tasks. They remain susceptible to hallucinations, generating outputs misaligned with visual content or instructions. We propose reflective instruction tuning, which integrates rationale learning into visual instruction tuning.
arXiv Detail & Related papers (2024-07-16T06:32:45Z)
Know the Unknown: An Uncertainty-Sensitive Method for LLM Instruction Tuning [18.283963879468466]
Large language models (LLMs) have demonstrated remarkable capabilities but still face challenges such as hallucinations. We propose a novel approach called uncertainty-sensitive tuning to improve models' capability to recognize the boundaries of their knowledge. Our experimental results demonstrate that our proposed uncertainty-sensitive tuning method enhance the model's ability to identify areas of uncertainty.
arXiv Detail & Related papers (2024-06-14T14:56:04Z)
Debiasing Multimodal Large Language Models [61.6896704217147]
Large Vision-Language Models (LVLMs) have become indispensable tools in computer vision and natural language processing. Our investigation reveals a noteworthy bias in the generated content, where the output is primarily influenced by the underlying Large Language Models (LLMs) prior to the input image. To rectify these biases and redirect the model's focus toward vision information, we introduce two simple, training-free strategies.
arXiv Detail & Related papers (2024-03-08T12:35:07Z)
Do Compressed LLMs Forget Knowledge? An Experimental Study with Practical Implications [63.29358103217275]
Large Language Models (LLMs) often leads to reduced performance, especially for knowledge-intensive tasks. We propose two conjectures on the nature of the damage: one is certain knowledge being forgotten (or erased) after compression. We introduce a variant called Inference-time Dynamic Prompting (IDP) that can effectively increase prompt diversity without incurring any inference overhead.
arXiv Detail & Related papers (2023-10-02T03:12:06Z)
Query-Dependent Prompt Evaluation and Optimization with Offline Inverse RL [62.824464372594576]
We aim to enhance arithmetic reasoning ability of Large Language Models (LLMs) through zero-shot prompt optimization. We identify a previously overlooked objective of query dependency in such optimization. We introduce Prompt-OIRL, which harnesses offline inverse reinforcement learning to draw insights from offline prompting demonstration data.
arXiv Detail & Related papers (2023-09-13T01:12:52Z)
Making Large Language Models Better Reasoners with Alignment [57.82176656663245]
Reasoning is a cognitive process of using evidence to reach a sound conclusion. Recent studies reveal that fine-tuning LLMs on data with the chain of thought (COT) reasoning process can significantly enhance their reasoning capabilities. We introduce an textitAlignment Fine-Tuning (AFT) paradigm, which involves three steps.
arXiv Detail & Related papers (2023-09-05T11:32:48Z)
PREFER: Prompt Ensemble Learning via Feedback-Reflect-Refine [24.888093229577965]
We propose a simple, universal, and automatic method named PREFER to address the stated limitations. Our PREFER achieves state-of-the-art performance in multiple types of tasks by a significant margin.
arXiv Detail & Related papers (2023-08-23T09:46:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.