Hybrid Alignment Training for Large Language Models
- URL: http://arxiv.org/abs/2406.15178v1
- Date: Fri, 21 Jun 2024 14:23:57 GMT
- Title: Hybrid Alignment Training for Large Language Models
- Authors: Chenglong Wang, Hang Zhou, Kaiyan Chang, Bei Li, Yongyu Mu, Tong Xiao, Tongran Liu, Jingbo Zhu,
- Abstract summary: Alignment training is crucial for enabling large language models to cater to human intentions and preferences.
We propose a Hybrid Alignment Training (Hbat) approach, based on alternating alignment and modified elastic weight consolidation methods.
Experimental results show that the proposed textscHbat can significantly outperform all baselines.
- Score: 60.46220684809339
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Alignment training is crucial for enabling large language models (LLMs) to cater to human intentions and preferences. It is typically performed based on two stages with different objectives: instruction-following alignment and human-preference alignment. However, aligning LLMs with these objectives in sequence suffers from an inherent problem: the objectives may conflict, and the LLMs cannot guarantee to simultaneously align with the instructions and human preferences well. To response to these, in this work, we propose a Hybrid Alignment Training (Hbat) approach, based on alternating alignment and modified elastic weight consolidation methods. The basic idea is to alternate between different objectives during alignment training, so that better collaboration can be achieved between the two alignment tasks.We experiment with Hbat on summarization and dialogue tasks. Experimental results show that the proposed \textsc{Hbat} can significantly outperform all baselines. Notably, Hbat yields consistent performance gains over the traditional two-stage alignment training when using both proximal policy optimization and direct preference optimization.
Related papers
- Gradient-Adaptive Policy Optimization: Towards Multi-Objective Alignment of Large Language Models [19.559468441956714]
Reinforcement Learning from Human Feedback has emerged as a powerful technique for aligning large language models with human preferences.<n>We frame human value alignment as a multi-objective optimization problem, aiming to maximize a set of potentially conflicting objectives.<n>We introduce Gradient-Adaptive Policy Optimization (GAPO), a novel fine-tuning paradigm that employs multiple-gradient descent to align LLMs with diverse preference distributions.
arXiv Detail & Related papers (2025-07-02T17:25:26Z) - ComPO: Preference Alignment via Comparison Oracles [36.81379432115315]
We propose a new preference alignment method based on comparison oracles and provide the convergence guarantee for its basic scheme.<n>A highlight of our work is that we evidence the importance of designing specialized methods for preference pairs with distinct likelihood margin.
arXiv Detail & Related papers (2025-05-08T17:56:57Z) - Self-Improvement Towards Pareto Optimality: Mitigating Preference Conflicts in Multi-Objective Alignment [74.25832963097658]
Multi-Objective Alignment (MOA) aims to align responses with multiple human preference objectives.
We find that DPO-based MOA approaches suffer from widespread preference conflicts in the data.
arXiv Detail & Related papers (2025-02-20T08:27:00Z) - On-the-fly Preference Alignment via Principle-Guided Decoding [27.50204023448716]
We introduce On-the-fly Preference Alignment via Principle-Guided Decoding (OPAD) to align model outputs with human preferences during inference.
OPAD achieves competitive or superior performance in both general and personalized alignment tasks.
arXiv Detail & Related papers (2025-02-20T02:23:09Z) - Dynamic Rewarding with Prompt Optimization Enables Tuning-free Self-Alignment of Language Models [54.381650481255235]
We introduce a new tuning-free approach for self-alignment, Dynamic Rewarding with Prompt Optimization (O)
Our approach leverages a search-based optimization framework that allows LLMs to iteratively self-improve and craft the optimal alignment instructions.
Empirical evaluations on eight recent LLMs, both open and closed-sourced, demonstrate that DRPO significantly enhances alignment performance.
arXiv Detail & Related papers (2024-11-13T16:15:38Z) - MetaAlign: Align Large Language Models with Diverse Preferences during Inference Time [50.41806216615488]
Large Language Models (LLMs) acquire extensive knowledge and remarkable abilities from extensive text corpora.
To make LLMs more usable, aligning them with human preferences is essential.
We propose an effective method, textbf MetaAlign, which aims to help LLMs dynamically align with various explicit or implicit preferences specified at inference time.
arXiv Detail & Related papers (2024-10-18T05:31:13Z) - Joint Demonstration and Preference Learning Improves Policy Alignment with Human Feedback [58.049113055986375]
We develop a single stage approach named Alignment with Integrated Human Feedback (AIHF) to train reward models and the policy.
The proposed approach admits a suite of efficient algorithms, which can easily reduce to, and leverage, popular alignment algorithms.
We demonstrate the efficiency of the proposed solutions with extensive experiments involving alignment problems in LLMs and robotic control problems in MuJoCo.
arXiv Detail & Related papers (2024-06-11T01:20:53Z) - AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability [26.181345324220743]
Multimodal Large Language Models (MLLMs) are widely regarded as crucial in the exploration of Artificial General Intelligence (AGI)
The core of MLLMs lies in their capability to achieve cross-modal alignment.
Despite their success, there are shortcomings in the modeling of alignment capabilities within these models.
arXiv Detail & Related papers (2024-05-23T03:07:56Z) - Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment [103.12563033438715]
Alignment in artificial intelligence pursues consistency between model responses and human preferences as well as values.
Existing alignment techniques are mostly unidirectional, leading to suboptimal trade-offs and poor flexibility over various objectives.
We introduce controllable preference optimization (CPO), which explicitly specifies preference scores for different objectives.
arXiv Detail & Related papers (2024-02-29T12:12:30Z) - Linear Alignment: A Closed-form Solution for Aligning Human Preferences without Tuning and Feedback [70.32795295142648]
Linear alignment is a novel algorithm that aligns language models with human preferences in one single inference step.
Experiments on both general and personalized preference datasets demonstrate that linear alignment significantly enhances the performance and efficiency of LLM alignment.
arXiv Detail & Related papers (2024-01-21T10:46:23Z) - Supervised Contrastive Learning as Multi-Objective Optimization for
Fine-Tuning Large Pre-trained Language Models [3.759936323189417]
Supervised Contrastive Learning (SCL) has been shown to achieve excellent performance in most classification tasks.
In this work, we formulate the SCL problem as a Multi-Objective Optimization problem for the fine-tuning phase of RoBERTa language model.
arXiv Detail & Related papers (2022-09-28T15:13:58Z) - Using Optimal Transport as Alignment Objective for fine-tuning
Multilingual Contextualized Embeddings [7.026476782041066]
We propose using Optimal Transport (OT) as an alignment objective during fine-tuning to improve multilingual contextualized representations.
This approach does not require word-alignment pairs prior to fine-tuning and instead learns the word alignments within context in an unsupervised manner.
arXiv Detail & Related papers (2021-10-06T16:13:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.