Robust Multi-Objective Controlled Decoding of Large Language Models
- URL: http://arxiv.org/abs/2503.08796v1
- Date: Tue, 11 Mar 2025 18:15:26 GMT
- Title: Robust Multi-Objective Controlled Decoding of Large Language Models
- Authors: Seongho Son, William Bankes, Sangwoong Yoon, Shyam Sundhar Ramesh, Xiaohang Tang, Ilija Bogunovic,
- Abstract summary: We introduce Robust Multi-Objective Decoding (RMOD), a novel inference-time algorithm that optimize for improving worst-case rewards.<n>RMOD formalizes the robust decoding problem as a maximin two-player game between reward weights and the sampling policy.<n>We show that the game reduces to a convex optimization problem to find the worst-case weights, while the best response policy can be computed analytically.
- Score: 14.58153072993207
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Test-time alignment of Large Language Models (LLMs) to human preferences offers a flexible way to generate responses aligned to diverse objectives without extensive retraining of LLMs. Existing methods achieve alignment to multiple objectives simultaneously (e.g., instruction-following, helpfulness, conciseness) by optimizing their corresponding reward functions. However, they often rely on predefined weights or optimize for averages, sacrificing one objective for another and leading to unbalanced outcomes. To address this, we introduce Robust Multi-Objective Decoding (RMOD), a novel inference-time algorithm that optimizes for improving worst-case rewards. RMOD formalizes the robust decoding problem as a maximin two-player game between reward weights and the sampling policy, solving for the Nash equilibrium. We show that the game reduces to a convex optimization problem to find the worst-case weights, while the best response policy can be computed analytically. We also introduce a practical RMOD variant designed for efficient decoding with contemporary LLMs, incurring minimal computational overhead compared to non-robust Multi-Objective Decoding (MOD) methods. Our experimental results showcase the effectiveness of RMOD in generating responses equitably aligned with diverse objectives, outperforming baselines up to 20%.
Related papers
- Collab: Controlled Decoding using Mixture of Agents for LLM Alignment [90.6117569025754]
Reinforcement learning from human feedback has emerged as an effective technique to align Large Language models.
Controlled Decoding provides a mechanism for aligning a model at inference time without retraining.
We propose a mixture of agent-based decoding strategies leveraging the existing off-the-shelf aligned LLM policies.
arXiv Detail & Related papers (2025-03-27T17:34:25Z) - Scalable Best-of-N Selection for Large Language Models via Self-Certainty [65.31658824274894]
Best-of-N selection is a key technique for improving the reasoning performance of Large Language Models.
We propose self-certainty, a novel and efficient metric to estimate response quality without requiring external reward models.
Our findings establish self-certainty as a practical and efficient way for improving LLM reasoning capabilities.
arXiv Detail & Related papers (2025-02-25T19:08:07Z) - Projection Optimization: A General Framework for Multi-Objective and Multi-Group RLHF [13.612504157832708]
Reinforcement Learning with Human Feedback (RLHF) is a widely used fine-tuning approach that aligns machine learning model with human preferences.<n>In this work, we transform the non-linear aggregation problem into a series of sub-problems and extend our framework to handle multi-group scenarios.<n>We demonstrate that our algorithmic framework achieves sublinear regret and can be easily adapted to a reward-free algorithm.
arXiv Detail & Related papers (2025-02-21T01:56:52Z) - Reward-Guided Speculative Decoding for Efficient LLM Reasoning [80.55186052123196]
We introduce Reward-Guided Speculative Decoding (RSD), a novel framework aimed at improving the efficiency of inference in large language models (LLMs)
RSD incorporates a controlled bias to prioritize high-reward outputs, in contrast to existing speculative decoding methods that enforce strict unbiasedness.
RSD delivers significant efficiency gains against decoding with the target model only, while achieving significant better accuracy than parallel decoding method on average.
arXiv Detail & Related papers (2025-01-31T17:19:57Z) - IterIS: Iterative Inference-Solving Alignment for LoRA Merging [14.263218227928729]
Low-rank adaptations (LoRAs) are widely used to fine-tune large models across various domains for specific downstream tasks.
LoRA merging presents an effective solution by combining multiple LoRAs into a unified adapter while maintaining data privacy.
arXiv Detail & Related papers (2024-11-21T19:04:02Z) - Decoding-Time Language Model Alignment with Multiple Objectives [116.42095026960598]
Existing methods primarily focus on optimizing LMs for a single reward function, limiting their adaptability to varied objectives.
Here, we propose $textbfmulti-objective decoding (MOD)$, a decoding-time algorithm that outputs the next token from a linear combination of predictions.
We show why existing approaches can be sub-optimal even in natural settings and obtain optimality guarantees for our method.
arXiv Detail & Related papers (2024-06-27T02:46:30Z) - Self-Exploring Language Models: Active Preference Elicitation for Online Alignment [88.56809269990625]
We propose a bilevel objective optimistically biased towards potentially high-reward responses to actively explore out-of-distribution regions.
Our experimental results demonstrate that when fine-tuned on Zephyr-7B-SFT and Llama-3-8B-Instruct models, Self-Exploring Language Models (SELM) significantly boosts the performance on instruction-following benchmarks.
arXiv Detail & Related papers (2024-05-29T17:59:07Z) - UCB-driven Utility Function Search for Multi-objective Reinforcement Learning [75.11267478778295]
In Multi-objective Reinforcement Learning (MORL) agents are tasked with optimising decision-making behaviours.
We focus on the case of linear utility functions parameterised by weight vectors w.
We introduce a method based on Upper Confidence Bound to efficiently search for the most promising weight vectors during different stages of the learning process.
arXiv Detail & Related papers (2024-05-01T09:34:42Z) - Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization [76.09576643028362]
We present Multi-Objective Direct Preference Optimization (MODPO) for multiple alignment objectives.
MODPO folds language modeling directly into reward modeling, training language models as implicit collective reward models.
It theoretically yields the same optimal solutions as MORLHF but is practically more stable and efficient.
arXiv Detail & Related papers (2023-10-05T17:35:26Z) - Symmetric Replay Training: Enhancing Sample Efficiency in Deep Reinforcement Learning for Combinatorial Optimization [42.92248233465095]
We propose a simple but effective method, called symmetric replay training (SRT), which can be easily integrated into various Deep reinforcement learning (DRL) methods.
Our method leverages high-reward samples to encourage exploration of symmetric regions without additional online interactions - free.
Experimental results demonstrate the consistent improvement of our method in sample efficiency across diverse DRL methods applied to real-world tasks.
arXiv Detail & Related papers (2023-06-02T05:34:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.