Related papers: Impact of Preference Noise on the Alignment Performance of Generative Language Models

Impact of Preference Noise on the Alignment Performance of Generative Language Models

URL: http://arxiv.org/abs/2404.09824v1
Date: Mon, 15 Apr 2024 14:21:53 GMT
Title: Impact of Preference Noise on the Alignment Performance of Generative Language Models
Authors: Yang Gao, Dana Alon, Donald Metzler,
Abstract summary: We study the impact of preference noise on the alignment performance in two tasks (summarization and dialogue generation) We find that the alignment performance can be highly sensitive to the noise rates in the preference data. To mitigate the impact of noise, confidence-based data filtering shows significant benefit when certain types of noise are present.
Score: 31.64856885517905
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A key requirement in developing Generative Language Models (GLMs) is to have their values aligned with human values. Preference-based alignment is a widely used paradigm for this purpose, in which preferences over generation pairs are first elicited from human annotators or AI systems, and then fed into some alignment techniques, e.g., Direct Preference Optimization. However, a substantial percent (20 - 40%) of the preference pairs used in GLM alignment are noisy, and it remains unclear how the noise affects the alignment performance and how to mitigate its negative impact. In this paper, we propose a framework to inject desirable amounts and types of noise to the preferences, and systematically study the impact of preference noise on the alignment performance in two tasks (summarization and dialogue generation). We find that the alignment performance can be highly sensitive to the noise rates in the preference data: e.g., a 10 percentage points (pp) increase of the noise rate can lead to 30 pp drop in the alignment performance (in win rate). To mitigate the impact of noise, confidence-based data filtering shows significant benefit when certain types of noise are present. We hope our work can help the community better understand and mitigate the impact of preference noise in GLM alignment.

Related papers

How Well Can Preference Optimization Generalize Under Noisy Feedback? [7.374590753074647]
Preference optimization trains models to distinguish between preferred and non-preferred responses based on human feedback.<n>Most existing works assume noise-free feedback, which is unrealistic due to the inherent errors and inconsistencies in human judgments.<n>This paper addresses the impact of noisy feedback on preference optimization, providing generalization guarantees under these conditions.
arXiv Detail & Related papers (2025-10-01T20:56:31Z)
Machine Unlearning for Robust DNNs: Attribution-Guided Partitioning and Neuron Pruning in Noisy Environments [5.8166742412657895]
Deep neural networks (DNNs) have achieved remarkable success across diverse domains, but their performance can be severely degraded by noisy or corrupted training data.<n>We propose a novel framework that integrates attribution-guided data partitioning, discriminative neuron pruning, and targeted fine-tuning to mitigate the impact of noisy samples.<n>Our framework achieves approximately a 10% absolute accuracy improvement over standard retraining on CIFAR-10 with injected label noise.
arXiv Detail & Related papers (2025-06-13T09:37:11Z)
On Symmetric Losses for Robust Policy Optimization with Noisy Preferences [55.8615920580824]
This work focuses on reward modeling, a core component in reinforcement learning from human feedback.<n>We propose a principled framework for robust policy optimization under noisy preferences.<n>We prove that symmetric losses enable successful policy optimization even under noisy labels.
arXiv Detail & Related papers (2025-05-30T15:30:43Z)
One Goal, Many Challenges: Robust Preference Optimization Amid Content-Aware and Multi-Source Noise [0.0]
This paper introduces Content-Aware Noise-Resilient Preference Optimization (CNRPO), a novel framework that addresses multiple sources of content-dependent noise in preference learning. We leverage backdoor attack mechanisms to efficiently learn and control various noise sources within a single model.
arXiv Detail & Related papers (2025-03-16T00:22:00Z)
Enhance Vision-Language Alignment with Noise [59.2608298578913]
We investigate whether the frozen model can be fine-tuned by customized noise. We propose Positive-incentive Noise (PiNI) which can fine-tune CLIP via injecting noise into both visual and text encoders.
arXiv Detail & Related papers (2024-12-14T12:58:15Z)
Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization [45.6430987775264]
This study addresses the challenge of noise in training datasets for Direct Preference Optimization (DPO) We categorize noise into pointwise noise, which includes low-quality data points, and pairwise noise, which encompasses erroneous data pair associations. We introduce Distributionally Robustifying DPO, which integrates pairwise robustness by optimizing against worst-case pairwise scenarios.
arXiv Detail & Related papers (2024-07-10T17:48:25Z)
Aligning Large Language Models with Self-generated Preference Data [72.99676237703099]
We propose a new framework that boosts the alignment of large language models (LLMs) with human preferences. Our key idea is leveraging the human prior knowledge within the small (seed) data. We introduce a noise-aware preference learning algorithm to mitigate the risk of low quality within generated preference data.
arXiv Detail & Related papers (2024-06-06T18:01:02Z)
ROPO: Robust Preference Optimization for Large Language Models [59.10763211091664]
We propose an iterative alignment approach that integrates noise-tolerance and filtering of noisy samples without the aid of external models. Experiments on three widely-used datasets with Mistral-7B and Llama-2-7B demonstrate that ROPO significantly outperforms existing preference alignment methods.
arXiv Detail & Related papers (2024-04-05T13:58:51Z)
Learning with Noisy Foundation Models [95.50968225050012]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets. We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z)
Provably Robust DPO: Aligning Language Models with Noisy Feedback [10.523790076060171]
We introduce a general framework for policy optimization in the presence of random preference flips. We design a novel loss function, which de-bias the effect of noise on average, making a policy trained by minimizing that loss robust to the noise. Our experiments on IMDb sentiment generation and Anthropic's helpful-harmless dataset show that rDPO is robust to noise in preference labels compared to vanilla DPO.
arXiv Detail & Related papers (2024-03-01T09:55:18Z)
Adaptive Differential Privacy in Federated Learning: A Priority-Based Approach [0.0]
Federated learning (FL) develops global models without direct access to local datasets. DP offers a framework that gives a privacy guarantee by adding certain amounts of noise to parameters. We propose adaptive noise addition in FL which decides the value of injected noise based on features' relative importance.
arXiv Detail & Related papers (2024-01-04T03:01:15Z)
Negative Pre-aware for Noisy Cross-modal Matching [46.5591267410225]
Cross-modal noise-robust learning is a challenging task since noisy correspondence is hard to recognize and rectify. We present a novel Negative Pre-aware Cross-modal matching solution for large visual-language model fine-tuning on noisy downstream tasks.
arXiv Detail & Related papers (2023-12-10T05:52:36Z)
Inference and Denoise: Causal Inference-based Neural Speech Enhancement [83.4641575757706]
This study addresses the speech enhancement (SE) task within the causal inference paradigm by modeling the noise presence as an intervention. The proposed causal inference-based speech enhancement (CISE) separates clean and noisy frames in an intervened noisy speech using a noise detector and assigns both sets of frames to two mask-based enhancement modules (EMs) to perform noise-conditional SE.
arXiv Detail & Related papers (2022-11-02T15:03:50Z)
Partial Identification with Noisy Covariates: A Robust Optimization Approach [94.10051154390237]
Causal inference from observational datasets often relies on measuring and adjusting for covariates. We show that this robust optimization approach can extend a wide range of causal adjustment methods to perform partial identification. Across synthetic and real datasets, we find that this approach provides ATE bounds with a higher coverage probability than existing methods.
arXiv Detail & Related papers (2022-02-22T04:24:26Z)
On Dynamic Noise Influence in Differentially Private Learning [102.6791870228147]
Private Gradient Descent (PGD) is a commonly used private learning framework, which noises based on the Differential protocol. Recent studies show that emphdynamic privacy schedules can improve at the final iteration, yet yet theoreticals of the effectiveness of such schedules remain limited. This paper provides comprehensive analysis of noise influence in dynamic privacy schedules to answer these critical questions.
arXiv Detail & Related papers (2021-01-19T02:04:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.