Related papers: Projection Optimization: A General Framework for Multi-Objective and Multi-Group RLHF

Projection Optimization: A General Framework for Multi-Objective and Multi-Group RLHF

URL: http://arxiv.org/abs/2502.15145v2
Date: Mon, 24 Feb 2025 06:06:04 GMT
Title: Projection Optimization: A General Framework for Multi-Objective and Multi-Group RLHF
Authors: Nuoya Xiong, Aarti Singh,
Abstract summary: Reinforcement Learning with Human Feedback (RLHF) is a widely used fine-tuning approach that aligns machine learning model with human preferences.<n>In this work, we transform the non-linear aggregation problem into a series of sub-problems and extend our framework to handle multi-group scenarios.<n>We demonstrate that our algorithmic framework achieves sublinear regret and can be easily adapted to a reward-free algorithm.
Score: 13.612504157832708
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reinforcement Learning with Human Feedback (RLHF) is a widely used fine-tuning approach that aligns machine learning model, particularly Language Model (LM) with human preferences. There are typically multiple objectives driving the preference, hence humans find it easier to express per-objective comparisons rather than a global preference between two choices. Multi-Objective RLHF (MORLHF) aims to use per-objective preference feedback and achieve Pareto optimality among these objectives by aggregating them into a single unified objective for optimization. However, nearly all prior works rely on linear aggregation, which rules out policies that favor specific objectives such as the worst one. The only existing approach using non-linear aggregation is computationally expensive due to its reward-based nature and the need for retraining whenever the aggregation parameters change. In this work, we address this limitation by transforming the non-linear aggregation maximization problem into a series of sub-problems. Each sub-problem involves only linear aggregation, making it computationally efficient to solve. We further extend our framework to handle multi-group scenarios, where each group has distinct weights for the objectives. Our method enables achieving consensus or maximizing the aggregated objective across all groups. Theoretically, we demonstrate that our algorithmic framework achieves sublinear regret and can be easily adapted to a reward-free algorithm. Empirically, leveraging our theoretical insights, we propose a nearly training-free algorithm once the optimal policies for individual objectives are obtained.

Related papers

Gradient-Adaptive Policy Optimization: Towards Multi-Objective Alignment of Large Language Models [19.559468441956714]
Reinforcement Learning from Human Feedback has emerged as a powerful technique for aligning large language models with human preferences.<n>We frame human value alignment as a multi-objective optimization problem, aiming to maximize a set of potentially conflicting objectives.<n>We introduce Gradient-Adaptive Policy Optimization (GAPO), a novel fine-tuning paradigm that employs multiple-gradient descent to align LLMs with diverse preference distributions.
arXiv Detail & Related papers (2025-07-02T17:25:26Z)
Multi-Objective Preference Optimization: Improving Human Alignment of Generative Models [15.799929216215672]
We introduce the Multi-Objective Preference Optimization (MOPO) algorithm, which frames alignment as a constrained KL-regularized optimization.<n>Unlike prior work, MOPO operates directly on pairwise preference data, requires no point-wise reward assumption, and avoids prompt-context engineering.
arXiv Detail & Related papers (2025-05-16T05:58:26Z)
Collab: Controlled Decoding using Mixture of Agents for LLM Alignment [90.6117569025754]
Reinforcement learning from human feedback has emerged as an effective technique to align Large Language models. Controlled Decoding provides a mechanism for aligning a model at inference time without retraining. We propose a mixture of agent-based decoding strategies leveraging the existing off-the-shelf aligned LLM policies.
arXiv Detail & Related papers (2025-03-27T17:34:25Z)
MPO: An Efficient Post-Processing Framework for Mixing Diverse Preference Alignment [14.541973333460149]
Mixing Preference Optimization (MPO) is a post-processing framework for aggregating single-objective policies. MPO achieves balanced performance across diverse preferences, outperforming existing models with significantly reduced computational costs.
arXiv Detail & Related papers (2025-02-25T23:22:12Z)
Self-Improvement Towards Pareto Optimality: Mitigating Preference Conflicts in Multi-Objective Alignment [74.25832963097658]
Multi-Objective Alignment (MOA) aims to align responses with multiple human preference objectives.<n>We find that DPO-based MOA approaches suffer from widespread preference conflicts in the data.
arXiv Detail & Related papers (2025-02-20T08:27:00Z)
Intersectional Fairness in Reinforcement Learning with Large State and Constraint Spaces [16.400288624027375]
In many real-world settings, it is important to optimize over multiple objectives simultaneously.<n>We consider a multi-objective optimization problem in which each objective is defined by a state-based reweighting of a single scalar reward function.<n>We provide oracle-efficient algorithms to solve these multi-objective RL problems even when the number of objectives is exponentially large.
arXiv Detail & Related papers (2025-02-17T14:25:33Z)
Few-shot Steerable Alignment: Adapting Rewards and LLM Policies with Neural Processes [50.544186914115045]
Large language models (LLMs) are increasingly embedded in everyday applications.<n> Ensuring their alignment with the diverse preferences of individual users has become a critical challenge.<n>We present a novel framework for few-shot steerable alignment.
arXiv Detail & Related papers (2024-12-18T16:14:59Z)
Rethinking Multi-Objective Learning through Goal-Conditioned Supervised Learning [8.593384839118658]
Multi-objective learning aims to optimize multiple objectives simultaneously with a single model.<n>It suffers from the difficulty to formalize and conduct the exact learning process.<n>We propose a general framework for automatically learning to achieve multiple objectives based on the existing sequential data.
arXiv Detail & Related papers (2024-12-12T03:47:40Z)
Decoding-Time Language Model Alignment with Multiple Objectives [116.42095026960598]
Existing methods primarily focus on optimizing LMs for a single reward function, limiting their adaptability to varied objectives. Here, we propose $textbfmulti-objective decoding (MOD)$, a decoding-time algorithm that outputs the next token from a linear combination of predictions. We show why existing approaches can be sub-optimal even in natural settings and obtain optimality guarantees for our method.
arXiv Detail & Related papers (2024-06-27T02:46:30Z)
Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment [103.12563033438715]
Alignment in artificial intelligence pursues consistency between model responses and human preferences as well as values. Existing alignment techniques are mostly unidirectional, leading to suboptimal trade-offs and poor flexibility over various objectives. We introduce controllable preference optimization (CPO), which explicitly specifies preference scores for different objectives.
arXiv Detail & Related papers (2024-02-29T12:12:30Z)
Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization [76.09576643028362]
We present Multi-Objective Direct Preference Optimization (MODPO) for multiple alignment objectives. MODPO folds language modeling directly into reward modeling, training language models as implicit collective reward models. It theoretically yields the same optimal solutions as MORLHF but is practically more stable and efficient.
arXiv Detail & Related papers (2023-10-05T17:35:26Z)
PD-MORL: Preference-Driven Multi-Objective Reinforcement Learning Algorithm [0.18416014644193063]
We propose a novel MORL algorithm that trains a single universal network to cover the entire preference space scalable to continuous robotic tasks. PD-MORL achieves up to 25% larger hypervolume for challenging continuous control tasks and uses an order of magnitude fewer trainable parameters compared to prior approaches.
arXiv Detail & Related papers (2022-08-16T19:23:02Z)
An Approach to Ordering Objectives and Pareto Efficient Solutions [0.0]
Solutions to multi-objective optimization problems can generally not be compared or ordered. Decision-makers are often made to believe that scaled objectives can be compared. We present a method that uses the probability integral transform in order to map the objectives of a problem into scores that all share the same range.
arXiv Detail & Related papers (2022-05-30T17:55:53Z)
Provable Multi-Objective Reinforcement Learning with Generative Models [98.19879408649848]
We study the problem of single policy MORL, which learns an optimal policy given the preference of objectives. Existing methods require strong assumptions such as exact knowledge of the multi-objective decision process. We propose a new algorithm called model-based envelop value (EVI) which generalizes the enveloped multi-objective $Q$-learning algorithm.
arXiv Detail & Related papers (2020-11-19T22:35:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.