Related papers: OrthAlign: Orthogonal Subspace Decomposition for Non-Interfering Multi-Objective Alignment

OrthAlign: Orthogonal Subspace Decomposition for Non-Interfering Multi-Objective Alignment

URL: http://arxiv.org/abs/2509.24610v2
Date: Tue, 30 Sep 2025 02:49:29 GMT
Title: OrthAlign: Orthogonal Subspace Decomposition for Non-Interfering Multi-Objective Alignment
Authors: Liang Lin, Zhihao Xu, Junhao Dong, Jian Zhao, Yuchen Yuan, Guibin Zhang, Miao Yu, Yiming Zhang, Zhengtao Yao, Huahui Yi, Dongrui Liu, Xinfeng Li, Kun Wang,
Abstract summary: Large language model (LLM) alignment faces a critical dilemma when addressing multiple human preferences.<n>We present OrthAlign, an innovative approach to resolve gradient-level conflicts in preference alignment.<n>We show that OrthAlign achieves maximum single-preference improvements ranging from 34.61% to 50.89% after multiple-objective alignment.
Score: 61.02595549125661
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language model (LLM) alignment faces a critical dilemma when addressing multiple human preferences: improvements in one dimension frequently come at the expense of others, creating unavoidable trade-offs between competing objectives like helpfulness and harmlessness. While prior work mainly focuses on constraint-based optimization algorithms and data selection strategies to mitigate conflicts, these approaches overlook the fundamental issue of resolving conflicts directly at the parameter level. In this paper, we present OrthAlign, an innovative approach that pioneers a new paradigm by leveraging orthogonal subspace decomposition to fundamentally resolve gradient-level conflicts in multi-objective preference alignment. OrthAlign strategically decomposes parameter update spaces into orthogonal subspaces, ensuring that optimization toward different preferences occurs in mathematically non-interfering directions. Building upon this, we provide theoretical guarantees demonstrating that when parameter increments satisfy both orthogonal subspace constraints and spectral norm bounds, the resulting updates exhibit linear Lipschitz growth rather than exponential instability, ensuring stable convergence across all preference dimensions. Extensive experiments show that: I. OrthAlign achieves maximum single-preference improvements ranging from 34.61% to 50.89% after multiple-objective alignment across helpful, harmless, and truthful dimensions. II. With an average overall reward improvement of 13.96%.

Related papers

Provable Last-Iterate Convergence for Multi-Objective Safe LLM Alignment via Optimistic Primal-Dual [26.51548597257528]
We introduce an optimistic primal-dual (OPD) algorithm that incorporates predictive updates for both primal and dual variables to stabilize saddle-point dynamics.<n>Our analysis reveals that optimism plays a crucial role in mitigating oscillations inherent to constrained alignment objectives.
arXiv Detail & Related papers (2026-02-25T17:54:52Z)
Reward-free Alignment for Conflicting Objectives [12.275610380458119]
We propose a Reward-free Alignment framework for Conflicted Objectives (RACO)<n>RACO directly leverages pairwise preference data and resolves gradient conflicts via a novel clipped variant of conflict-averse gradient descent.<n>We provide convergence guarantees to Pareto-critical points that respect user-specified objective weights, and further show that clipping can strictly improve convergence rate in the two-objective setting.
arXiv Detail & Related papers (2026-02-02T18:59:52Z)
Towards a Unified Analysis of Neural Networks in Nonparametric Instrumental Variable Regression: Optimization and Generalization [66.08522228989634]
We establish the first global convergence result of neural networks for two stage least squares (2SLS) approach in nonparametric instrumental variable regression (NPIV)<n>This is achieved by adopting a lifted perspective through mean-field Langevin dynamics (MFLD)
arXiv Detail & Related papers (2025-11-18T17:51:17Z)
Bounded Rationality for LLMs: Satisficing Alignment at Inference-Time [52.230936493691985]
We propose SITAlign, an inference framework that addresses the multifaceted nature of alignment by maximizing a primary objective while satisfying threshold-based constraints on secondary criteria.<n>We provide theoretical insights by deriving sub-optimality bounds of our satisficing based inference alignment approach.
arXiv Detail & Related papers (2025-05-29T17:56:05Z)
Multi-start Optimization Method via Scalarization based on Target Point-based Tchebycheff Distance for Multi-objective Optimization [2.9248680865344348]
Multi-objective optimization is crucial in scientific and industrial applications where solutions must balance trade-offs among conflicting objectives.<n>State-of-the-art methods, such as NSGA-III and MOEA/D, can handle many objectives but struggle with coverage issues.<n>We propose a novel multi-start optimization method that addresses these challenges.
arXiv Detail & Related papers (2025-05-01T02:27:25Z)
Generalization Bounds of Surrogate Policies for Combinatorial Optimization Problems [53.03951222945921]
We analyze smoothed (perturbed) policies, adding controlled random perturbations to the direction used by the linear oracle.<n>Our main contribution is a generalization bound that decomposes the excess risk into perturbation bias, statistical estimation error, and optimization error.<n>We illustrate the scope of the results on applications such as vehicle scheduling, highlighting how smoothing enables both tractable training and controlled generalization.
arXiv Detail & Related papers (2024-07-24T12:00:30Z)
Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment [103.12563033438715]
Alignment in artificial intelligence pursues consistency between model responses and human preferences as well as values. Existing alignment techniques are mostly unidirectional, leading to suboptimal trade-offs and poor flexibility over various objectives. We introduce controllable preference optimization (CPO), which explicitly specifies preference scores for different objectives.
arXiv Detail & Related papers (2024-02-29T12:12:30Z)
Double Duality: Variational Primal-Dual Policy Optimization for Constrained Reinforcement Learning [132.7040981721302]
We study the Constrained Convex Decision Process (MDP), where the goal is to minimize a convex functional of the visitation measure. Design algorithms for a constrained convex MDP faces several challenges, including handling the large state space.
arXiv Detail & Related papers (2024-02-16T16:35:18Z)
PARL: A Unified Framework for Policy Alignment in Reinforcement Learning from Human Feedback [106.63518036538163]
We present a novel unified bilevel optimization-based framework, textsfPARL, formulated to address the recently highlighted critical issue of policy alignment in reinforcement learning. Our framework addressed these concerns by explicitly parameterizing the distribution of the upper alignment objective (reward design) by the lower optimal variable. Our empirical results substantiate that the proposed textsfPARL can address the alignment concerns in RL by showing significant improvements.
arXiv Detail & Related papers (2023-08-03T18:03:44Z)
Smoothing the Edges: Smooth Optimization for Sparse Regularization using Hadamard Overparametrization [10.009748368458409]
We present a framework for smooth optimization of explicitly regularized objectives for (structured) sparsity. Our method enables fully differentiable approximation-free optimization and is thus compatible with the ubiquitous gradient descent paradigm in deep learning.
arXiv Detail & Related papers (2023-07-07T13:06:12Z)
Primal-Dual Sequential Subspace Optimization for Saddle-point Problems [3.9582154141918964]
We introduce a new sequential subspace optimization method for large-scale saddle-point problems. It solves auxiliary saddle-point problems in low-dimensional subspaces, spanned by directions derived from first-order information over the primal emphand dual variables. Experimental results demonstrate significantly better convergence relative to popular first-order methods.
arXiv Detail & Related papers (2020-08-20T18:19:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.