Mask-DPO: Generalizable Fine-grained Factuality Alignment of LLMs
- URL: http://arxiv.org/abs/2503.02846v1
- Date: Tue, 04 Mar 2025 18:20:24 GMT
- Title: Mask-DPO: Generalizable Fine-grained Factuality Alignment of LLMs
- Authors: Yuzhe Gu, Wenwei Zhang, Chengqi Lyu, Dahua Lin, Kai Chen,
- Abstract summary: Large language models (LLMs) exhibit hallucinations (i.e., unfaithful or nonsensical information) when serving as AI assistants in various domains.<n>Previous factuality alignment methods that conduct response-level preference learning inevitably introduced noises during training.<n>This paper proposes a fine-grained factuality alignment method based on Direct Preference Optimization (DPO), called Mask-DPO.
- Score: 56.74916151916208
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) exhibit hallucinations (i.e., unfaithful or nonsensical information) when serving as AI assistants in various domains. Since hallucinations always come with truthful content in the LLM responses, previous factuality alignment methods that conduct response-level preference learning inevitably introduced noises during training. Therefore, this paper proposes a fine-grained factuality alignment method based on Direct Preference Optimization (DPO), called Mask-DPO. Incorporating sentence-level factuality as mask signals, Mask-DPO only learns from factually correct sentences in the preferred samples and prevents the penalty on factual contents in the not preferred samples, which resolves the ambiguity in the preference learning. Extensive experimental results demonstrate that Mask-DPO can significantly improve the factuality of LLMs responses to questions from both in-domain and out-of-domain datasets, although these questions and their corresponding topics are unseen during training. Only trained on the ANAH train set, the score of Llama3.1-8B-Instruct on the ANAH test set is improved from 49.19% to 77.53%, even surpassing the score of Llama3.1-70B-Instruct (53.44%), while its FactScore on the out-of-domain Biography dataset is also improved from 30.29% to 39.39%. We further study the generalization property of Mask-DPO using different training sample scaling strategies and find that scaling the number of topics in the dataset is more effective than the number of questions. We provide a hypothesis of what factual alignment is doing with LLMs, on the implication of this phenomenon, and conduct proof-of-concept experiments to verify it. We hope the method and the findings pave the way for future research on scaling factuality alignment.
Related papers
- From Misleading Queries to Accurate Answers: A Three-Stage Fine-Tuning Method for LLMs [5.23164145730825]
Large language models (LLMs) exhibit excellent performance in natural language processing (NLP)
Existing methods focus on correcting the output, but they often overlook the potential of improving the ability of LLMs to detect and correct misleading content in the input itself.
We propose a novel three-stage fine-tuning method that enhances the ability of LLMs to detect and correct misleading information in the input.
arXiv Detail & Related papers (2025-04-15T15:16:45Z) - Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback [40.01227095901647]
Large language models (LLMs) demonstrate impressive performance but lack the flexibility to adapt to human preferences quickly without retraining.<n>We introduce Test-time Preference Optimization (TPO), a framework that aligns LLM outputs with human preferences during inference.<n>Our findings establish TPO as a practical, lightweight alternative for test-time preference optimization, achieving alignment on the fly.
arXiv Detail & Related papers (2025-01-22T14:15:46Z) - Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key [24.229983103296988]
Hallucination remains a major challenge for Large Vision-Language Models (LVLMs)
We propose On-Policy Alignment (OPA)-DPO framework, which uniquely leverages expert feedback to correct hallucinated responses.
OPA-DPO achieves an additional reduction in the hallucination rate of LLaVA-1.5-7B: 13.26% on the AMBER benchmark and 5.39% on the Object-Hal benchmark.
arXiv Detail & Related papers (2025-01-16T17:48:03Z) - Dynamic Uncertainty Ranking: Enhancing Retrieval-Augmented In-Context Learning for Long-Tail Knowledge in LLMs [50.29035873837]
Large language models (LLMs) can learn vast amounts of knowledge from diverse domains during pre-training.
Long-tail knowledge from specialized domains is often scarce and underrepresented, rarely appearing in the models' memorization.
We propose a reinforcement learning-based dynamic uncertainty ranking method for ICL that accounts for the varying impact of each retrieved sample on LLM predictions.
arXiv Detail & Related papers (2024-10-31T03:42:17Z) - Reward-Augmented Data Enhances Direct Preference Alignment of LLMs [63.32585910975191]
We introduce reward-conditioned Large Language Models (LLMs) that learn from the entire spectrum of response quality within the dataset.<n>We propose an effective yet simple data relabeling method that conditions the preference pairs on quality scores to construct a reward-augmented dataset.
arXiv Detail & Related papers (2024-10-10T16:01:51Z) - TIS-DPO: Token-level Importance Sampling for Direct Preference Optimization With Estimated Weights [73.9088920210495]
We propose a token-level importance sampling DPO objective named TIS-DPO that assigns importance weights to each token based on its reward.
TIS-DPO significantly outperforms various baseline methods on harmlessness and helpfulness alignment and summarization tasks.
arXiv Detail & Related papers (2024-10-06T04:03:00Z) - 3D-Properties: Identifying Challenges in DPO and Charting a Path Forward [17.27880657597116]
We revisit DPO, analyzing its theoretical foundations and empirical performance.<n>We identify three key properties, termed 3D properties, that emerge from DPO's learning process.<n>We propose simple regularization techniques that improve training stability and performance.
arXiv Detail & Related papers (2024-06-11T14:59:24Z) - Spread Preference Annotation: Direct Preference Judgment for Efficient LLM Alignment [72.99676237703099]
We propose a new framework that boosts the alignment of large language models with human preferences.
Our key idea is leveraging the human prior knowledge within the small (seed) data.
We introduce a noise-aware preference learning algorithm to mitigate the risk of low quality within generated preference data.
arXiv Detail & Related papers (2024-06-06T18:01:02Z) - Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data [102.16105233826917]
Learning from preference labels plays a crucial role in fine-tuning large language models.
There are several distinct approaches for preference fine-tuning, including supervised learning, on-policy reinforcement learning (RL), and contrastive learning.
arXiv Detail & Related papers (2024-04-22T17:20:18Z) - Alpaca against Vicuna: Using LLMs to Uncover Memorization of LLMs [61.04246774006429]
We introduce a black-box prompt optimization method that uses an attacker LLM agent to uncover higher levels of memorization in a victim agent.
We observe that our instruction-based prompts generate outputs with 23.7% higher overlap with training data compared to the baseline prefix-suffix measurements.
Our findings show that instruction-tuned models can expose pre-training data as much as their base-models, if not more so, and using instructions proposed by other LLMs can open a new avenue of automated attacks.
arXiv Detail & Related papers (2024-03-05T19:32:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.