Aligner: Efficient Alignment by Learning to Correct
- URL: http://arxiv.org/abs/2402.02416v5
- Date: Sat, 02 Nov 2024 10:01:38 GMT
- Title: Aligner: Efficient Alignment by Learning to Correct
- Authors: Jiaming Ji, Boyuan Chen, Hantao Lou, Donghai Hong, Borong Zhang, Xuehai Pan, Juntao Dai, Tianyi Qiu, Yaodong Yang,
- Abstract summary: We introduce Aligner, a model-agnostic, plug-and-play module that learns the correctional residuals between preferred and dispreferred answers.
It can be applied to various open-source and API-based models with only one-off training, making it suitable for rapid iteration.
Our experiments demonstrate performance improvements by deploying the same Aligner model across 11 different language models.
- Score: 10.056049435141645
- License:
- Abstract: With the rapid development of large language models (LLMs) and ever-evolving practical requirements, finding an efficient and effective alignment method has never been more critical. However, the tension between the complexity of current alignment methods and the need for rapid iteration in deployment scenarios necessitates the development of a model-agnostic alignment approach that can operate under these constraints. In this paper, we introduce Aligner, a novel and simple alignment paradigm that learns the correctional residuals between preferred and dispreferred answers using a small model. Designed as a model-agnostic, plug-and-play module, Aligner can be directly applied to various open-source and API-based models with only one-off training, making it suitable for rapid iteration. Notably, Aligner can be applied to any powerful, large-scale upstream models. Moreover, it can even iteratively bootstrap the upstream models using corrected responses as synthetic human preference data, breaking through the model's performance ceiling. Our experiments demonstrate performance improvements by deploying the same Aligner model across 11 different LLMs, evaluated on the 3H dimensions (helpfulness, harmlessness, and honesty). Specifically, Aligner-7B has achieved an average improvement of 68.9% in helpfulness and 23.8% in harmlessness across the tested LLMs while also effectively reducing hallucination. In the Alpaca-Eval leaderboard, stacking Aligner-2B on GPT-4 Turbo improved its LC Win Rate from 55.0% to 58.3%, surpassing GPT-4 Omni's 57.5% Win Rate (community report).
Related papers
- IRepair: An Intent-Aware Approach to Repair Data-Driven Errors in Large Language Models [11.075423190298686]
Large language models (LLMs) are notoriously vulnerable to biases in their dataset, leading to issues such as toxicity.
In this paper, we introduce a novel dynamic slicing-based intent-aware LLM repair strategy, IRepair.
We show that IRepair repairs errors 43.6% more effectively while causing 46% less disruption to general performance.
arXiv Detail & Related papers (2025-02-10T22:07:02Z) - ARIES: Stimulating Self-Refinement of Large Language Models by Iterative Preference Optimization [34.77238246296517]
A truly intelligent Large Language Model (LLM) should be capable of correcting errors in its responses through external interactions.
We introduce a novel post-training and inference framework, called ARIES: Adaptive Refinement and Iterative Enhancement Structure.
ARIES iteratively performs preference training and self-refinement-based data collection.
arXiv Detail & Related papers (2025-02-08T15:21:55Z) - From Drafts to Answers: Unlocking LLM Potential via Aggregation Fine-Tuning [31.95005389919542]
Scaling data and model size has been proven effective for boosting the performance of large language models.
In this work, we introduce Aggregation Fine-Tuning (AFT), a supervised finetuning paradigm.
Empirical evaluations on benchmark datasets show that AFT-trained models substantially outperform standard SFT.
arXiv Detail & Related papers (2025-01-21T04:11:59Z) - Stream Aligner: Efficient Sentence-Level Alignment via Distribution Induction [6.624814871290537]
Stream Aligner combines efficiency with enhanced performance in various tasks throughout the generation process.
Compared to Aligner, our experiments demonstrate that Stream Aligner reduces reliance on the capabilities of additional models, enhances the reasoning abilities of LLMs, and decreases latency during user interaction.
arXiv Detail & Related papers (2025-01-09T16:02:51Z) - Evolving Alignment via Asymmetric Self-Play [52.3079697845254]
We introduce a general open-ended RLHF framework that casts alignment as an asymmetric game between two players.
This framework of Evolving Alignment via Asymmetric Self-Play (eva) results in a simple and efficient approach that can utilize any existing RLHF algorithm for scalable alignment.
arXiv Detail & Related papers (2024-10-31T08:15:32Z) - Just Say What You Want: Only-prompting Self-rewarding Online Preference Optimization [64.34767799614328]
Current self-rewarding approaches rely heavily on the discriminator's judgment capabilities.
We propose a novel, only-prompting self-rewarding online algorithm that generates preference datasets without relying on judgment capabilities.
arXiv Detail & Related papers (2024-09-26T04:41:08Z) - Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs [54.05511925104712]
We propose a simple, effective, and data-efficient method called Step-DPO.
Step-DPO treats individual reasoning steps as units for preference optimization rather than evaluating answers holistically.
Our findings demonstrate that as few as 10K preference data pairs and fewer than 500 Step-DPO training steps can yield a nearly 3% gain in accuracy on MATH for models with over 70B parameters.
arXiv Detail & Related papers (2024-06-26T17:43:06Z) - Uncertainty Aware Learning for Language Model Alignment [97.36361196793929]
We propose uncertainty-aware learning (UAL) to improve the model alignment of different task scenarios.
We implement UAL in a simple fashion -- adaptively setting the label smoothing value of training according to the uncertainty of individual samples.
Experiments on widely used benchmarks demonstrate that our UAL significantly and consistently outperforms standard supervised fine-tuning.
arXiv Detail & Related papers (2024-06-07T11:37:45Z) - Self-Play Preference Optimization for Language Model Alignment [75.83359213697854]
Recent advancements suggest that directly working with preference probabilities can yield a more accurate reflection of human preferences.
We propose a self-play-based method for language model alignment, which treats the problem as a constant-sum two-player game.
Our approach, dubbed Self-Play Preference Optimization (SPPO), utilizes iterative policy updates to provably approximate the Nash equilibrium.
arXiv Detail & Related papers (2024-05-01T17:59:20Z) - Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences [21.5605000515622]
This paper studies post-training large language models (LLMs) using preference feedback from an oracle to help a model iteratively improve over itself.
We introduce Direct Nash Optimization (DNO), a provable and efficient algorithm that marries the simplicity and stability of contrastive learning with theoretical generality from optimizing general preferences.
In our experiments, a resulting 7B parameter Orca-2.5 model achieves the state-of-the-art win-rate against GPT-4-Turbo of 33% on AlpacaE 2.0 (even after controlling for response length), an absolute gain of 26% (7% to 33%) over the initializing model
arXiv Detail & Related papers (2024-04-04T17:56:41Z) - AlpaGasus: Training A Better Alpaca with Fewer Data [93.6949102689243]
We propose a simple and effective data selection strategy that automatically identifies and filters out low-quality data.
We introduce AlpaGasus, which is finetuned on only 9k high-quality data filtered from the 52k Alpaca data.
AlpaGasus significantly outperforms the original Alpaca on multiple test sets and the controlled human evaluation.
arXiv Detail & Related papers (2023-07-17T17:59:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.