Gradient-Adaptive Policy Optimization: Towards Multi-Objective Alignment of Large Language Models
- URL: http://arxiv.org/abs/2507.01915v1
- Date: Wed, 02 Jul 2025 17:25:26 GMT
- Title: Gradient-Adaptive Policy Optimization: Towards Multi-Objective Alignment of Large Language Models
- Authors: Chengao Li, Hanyu Zhang, Yunkun Xu, Hongyan Xue, Xiang Ao, Qing He,
- Abstract summary: Reinforcement Learning from Human Feedback has emerged as a powerful technique for aligning large language models with human preferences.<n>We frame human value alignment as a multi-objective optimization problem, aiming to maximize a set of potentially conflicting objectives.<n>We introduce Gradient-Adaptive Policy Optimization (GAPO), a novel fine-tuning paradigm that employs multiple-gradient descent to align LLMs with diverse preference distributions.
- Score: 19.559468441956714
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement Learning from Human Feedback (RLHF) has emerged as a powerful technique for aligning large language models (LLMs) with human preferences. However, effectively aligning LLMs with diverse human preferences remains a significant challenge, particularly when they are conflict. To address this issue, we frame human value alignment as a multi-objective optimization problem, aiming to maximize a set of potentially conflicting objectives. We introduce Gradient-Adaptive Policy Optimization (GAPO), a novel fine-tuning paradigm that employs multiple-gradient descent to align LLMs with diverse preference distributions. GAPO adaptively rescales the gradients for each objective to determine an update direction that optimally balances the trade-offs between objectives. Additionally, we introduce P-GAPO, which incorporates user preferences across different objectives and achieves Pareto solutions that better align with the user's specific needs. Our theoretical analysis demonstrates that GAPO converges towards a Pareto optimal solution for multiple objectives. Empirical results on Mistral-7B show that GAPO outperforms current state-of-the-art methods, achieving superior performance in both helpfulness and harmlessness.
Related papers
- Multi-Objective Preference Optimization: Improving Human Alignment of Generative Models [15.799929216215672]
We introduce the Multi-Objective Preference Optimization (MOPO) algorithm, which frames alignment as a constrained KL-regularized optimization.<n>Unlike prior work, MOPO operates directly on pairwise preference data, requires no point-wise reward assumption, and avoids prompt-context engineering.
arXiv Detail & Related papers (2025-05-16T05:58:26Z) - Leveraging Robust Optimization for LLM Alignment under Distribution Shifts [52.983390470606146]
Preference alignment methods are increasingly critical for steering large language models to generate outputs consistent with human values.<n>We propose a novel distribution-aware optimization framework that improves preference alignment despite such shifts.
arXiv Detail & Related papers (2025-04-08T09:14:38Z) - Optimizing Safe and Aligned Language Generation: A Multi-Objective GRPO Approach [2.8626097661711394]
Reinforcement Learning from Human Feedback has achieved notable success in steering models, but is complex and can be unstable.<n>Recent approaches such as Direct Preference Optimization (DPO) simplify preference-based fine-tuning but may introduce bias or trade-off certain objectives.<n>We propose a Group Relative Policy Optimization framework with a multi-label reward regression model to achieve safe and aligned language generation.
arXiv Detail & Related papers (2025-03-26T05:50:33Z) - Preference-Guided Diffusion for Multi-Objective Offline Optimization [64.08326521234228]
We propose a preference-guided diffusion model for offline multi-objective optimization.<n>Our guidance is a preference model trained to predict the probability that one design dominates another.<n>Our results highlight the effectiveness of classifier-guided diffusion models in generating diverse and high-quality solutions.
arXiv Detail & Related papers (2025-03-21T16:49:38Z) - Self-Improvement Towards Pareto Optimality: Mitigating Preference Conflicts in Multi-Objective Alignment [74.25832963097658]
Multi-Objective Alignment (MOA) aims to align responses with multiple human preference objectives.<n>We find that DPO-based MOA approaches suffer from widespread preference conflicts in the data.
arXiv Detail & Related papers (2025-02-20T08:27:00Z) - Deep Pareto Reinforcement Learning for Multi-Objective Recommender Systems [60.91599969408029]
optimizing multiple objectives simultaneously is an important task for recommendation platforms.
Existing multi-objective recommender systems do not systematically consider such dynamic relationships.
arXiv Detail & Related papers (2024-07-04T02:19:49Z) - Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment [103.12563033438715]
Alignment in artificial intelligence pursues consistency between model responses and human preferences as well as values.
Existing alignment techniques are mostly unidirectional, leading to suboptimal trade-offs and poor flexibility over various objectives.
We introduce controllable preference optimization (CPO), which explicitly specifies preference scores for different objectives.
arXiv Detail & Related papers (2024-02-29T12:12:30Z) - Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization [76.09576643028362]
We present Multi-Objective Direct Preference Optimization (MODPO) for multiple alignment objectives.
MODPO folds language modeling directly into reward modeling, training language models as implicit collective reward models.
It theoretically yields the same optimal solutions as MORLHF but is practically more stable and efficient.
arXiv Detail & Related papers (2023-10-05T17:35:26Z) - Interactive Hyperparameter Optimization in Multi-Objective Problems via
Preference Learning [65.51668094117802]
We propose a human-centered interactive HPO approach tailored towards multi-objective machine learning (ML)
Instead of relying on the user guessing the most suitable indicator for their needs, our approach automatically learns an appropriate indicator.
arXiv Detail & Related papers (2023-09-07T09:22:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.