CURATRON: Complete Robust Preference Data for Robust Alignment of Large
Language Models
- URL: http://arxiv.org/abs/2403.02745v1
- Date: Tue, 5 Mar 2024 07:58:12 GMT
- Title: CURATRON: Complete Robust Preference Data for Robust Alignment of Large
Language Models
- Authors: Son The Nguyen, Niranjan Uma Naresh, Theja Tulabandhula
- Abstract summary: This paper addresses the challenges of aligning large language models (LLMs) with human values via preference learning (PL)
We propose a novel method for curation robustly and completely recalibrating values within these datasets.
Our algorithms handle adversarial noise and unobserved comparisons well in both general and preference dataset settings.
- Score: 1.7849982327883962
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper addresses the challenges of aligning large language models (LLMs)
with human values via preference learning (PL), with a focus on the issues of
incomplete and corrupted data in preference datasets. We propose a novel method
for robustly and completely recalibrating values within these datasets to
enhance LLMs resilience against the issues. In particular, we devise a
guaranteed polynomial time ranking algorithm that robustifies several existing
models, such as the classic Bradley--Terry--Luce (BTL) (Bradley and Terry,
1952) model and certain generalizations of it. To the best of our knowledge,
our present work is the first to propose an algorithm that provably recovers an
{\epsilon}-optimal ranking with high probability while allowing as large as
O(n) perturbed pairwise comparison results per model response. Furthermore, we
show robust recovery results in the partially observed setting. Our experiments
confirm that our algorithms handle adversarial noise and unobserved comparisons
well in both general and LLM preference dataset settings. This work contributes
to the development and scaling of more reliable and ethically aligned AI models
by equipping the dataset curation pipeline with the ability to handle missing
and maliciously manipulated inputs.
Related papers
- Entropy Law: The Story Behind Data Compression and LLM Performance [115.70395740286422]
We find that model performance is negatively correlated to the compression ratio of training data, which usually yields a lower training loss.
Based on the findings of the entropy law, we propose a quite efficient and universal data selection method.
We also present an interesting application of entropy law that can detect potential performance risks at the beginning of model training.
arXiv Detail & Related papers (2024-07-09T08:14:29Z) - Aligning Large Language Models with Self-generated Preference Data [72.99676237703099]
We propose a new framework that boosts the alignment of large language models (LLMs) with human preferences.
Our key idea is leveraging the human prior knowledge within the small (seed) data.
We introduce a noise-aware preference learning algorithm to mitigate the risk of low quality within generated preference data.
arXiv Detail & Related papers (2024-06-06T18:01:02Z) - Self-Exploring Language Models: Active Preference Elicitation for Online Alignment [90.4820014819937]
We propose a bilevel objective optimistically biased towards potentially high-reward responses to actively explore out-of-distribution regions.
Our experimental results demonstrate that when finetuned on Zephyr-7B-SFT and Llama-3-8B-Instruct models, SELM significantly boosts the performance on instruction-following benchmarks.
arXiv Detail & Related papers (2024-05-29T17:59:07Z) - CLAIM Your Data: Enhancing Imputation Accuracy with Contextual Large Language Models [0.18416014644193068]
This paper introduces the Contextual Language model for Accurate Imputation Method (CLAIM)
Unlike traditional imputation methods, CLAIM utilizes contextually relevant natural language descriptors to fill missing values.
Our evaluations across diverse datasets and missingness patterns reveal CLAIM's superior performance over existing imputation techniques.
arXiv Detail & Related papers (2024-05-28T00:08:29Z) - Don't Forget Your Reward Values: Language Model Alignment via
Value-based Calibration [26.467379188463028]
We propose a novel textbfValue-based textbfCalitextbfBration (VCB) method to better align Large Language Models with human preferences.
Experimental results demonstrate that VCB surpasses existing alignment methods on AI assistant and summarization datasets.
arXiv Detail & Related papers (2024-02-25T08:45:10Z) - Enhancing Large Language Model Performance To Answer Questions and
Extract Information More Accurately [2.1715455600756646]
Large Language Models (LLMs) generate responses to questions.
Their effectiveness is often hindered by sub-optimal quality of answers and occasional failures to provide accurate responses to questions.
To address these challenges, a fine-tuning process is employed, involving feedback and examples to refine models.
arXiv Detail & Related papers (2024-01-27T00:18:07Z) - To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis [50.31589712761807]
Large language models (LLMs) are notoriously token-hungry during pre-training, and high-quality text data on the web is approaching its scaling limit for LLMs.
We investigate the consequences of repeating pre-training data, revealing that the model is susceptible to overfitting.
Second, we examine the key factors contributing to multi-epoch degradation, finding that significant factors include dataset size, model parameters, and training objectives.
arXiv Detail & Related papers (2023-05-22T17:02:15Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - Generating Data to Mitigate Spurious Correlations in Natural Language
Inference Datasets [27.562256973255728]
Natural language processing models often exploit spurious correlations between task-independent features and labels in datasets to perform well only within the distributions they are trained on.
We propose to tackle this problem by generating a debiased version of a dataset, which can then be used to train a debiased, off-the-shelf model.
Our approach consists of 1) a method for training data generators to generate high-quality, label-consistent data samples; and 2) a filtering mechanism for removing data points that contribute to spurious correlations.
arXiv Detail & Related papers (2022-03-24T09:08:05Z) - Nonparametric Estimation in the Dynamic Bradley-Terry Model [69.70604365861121]
We develop a novel estimator that relies on kernel smoothing to pre-process the pairwise comparisons over time.
We derive time-varying oracle bounds for both the estimation error and the excess risk in the model-agnostic setting.
arXiv Detail & Related papers (2020-02-28T21:52:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.