Fugu-MT 論文翻訳(概要): Towards Disentangled Preference Optimization Dynamics Beyond Likelihood Displacement

論文の概要: Towards Disentangled Preference Optimization Dynamics Beyond Likelihood Displacement

arxiv url: http://arxiv.org/abs/2604.18239v2
Date: Sat, 25 Apr 2026 16:04:01 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-28 17:12:06.893908
Title: Towards Disentangled Preference Optimization Dynamics Beyond Likelihood Displacement
Title（参考訳）: 擬似転位を超越した非交互選好最適化のダイナミクスを目指して
Authors: Wei Chen, Yubing Wu, Junmei Yang, Delu Zeng, Qibin Zhao, John Paisley, Min Chen, Zhou Wang,
Abstract要約: 本稿では、選好最適化のインセンティブ・スコアを統一的に分解する。遠近距離帯域 (DB) は, トレーニングが回避できる場合に特徴付ける, 単純で検証可能な条件である。そこで本稿では,DB と確率変位を満たすために,選択された更新と削除された更新のバランスを適応的に調整するプラグイン・アンド・プレイ・アンフレワード・キャリブレーション(RC)を提案する。
参考スコア（独自算出の注目度）: 33.80669933764735
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Preference optimization is widely used to align large language models (LLMs) with human preferences. However, many margin-based objectives suppress the chosen response along with the rejected one, a phenomenon known as likelihood displacement, and no general mechanism currently prevents this across objectives. We bridge this gap by presenting a unified \emph{incentive-score decomposition} of preference optimization, revealing that diverse objectives share identical local update directions and differ only in their scalar weighting coefficients. Building on this decomposition, by analyzing the dynamics of the chosen/rejected likelihoods, we identify the \emph{disentanglement band} (DB), a simple, testable condition that characterizes when training can avoid likelihood displacement by realizing the preferred pathway: suppressing the loser while maintaining the winner, possibly after an initial transient. Leveraging the DB, we propose a plug-and-play \emph{reward calibration} (RC) that adaptively rebalances chosen versus rejected updates to satisfy the DB and mitigate likelihood displacement, without redesigning the base objective. Empirical results show that RC steers training toward more disentangled dynamics and often improves downstream performance across a range of objectives. Our code is available at https://github.com/IceyWuu/DisentangledPreferenceOptimization.
Abstract（参考訳）: 優先度最適化は、大きな言語モデル(LLM)を人間の好みに合わせるために広く使われている。しかし、多くのマージンに基づく目的は、除去された反応とともに選択された反応を抑制し、これは確率変位と呼ばれる現象であり、現在は汎用的なメカニズムが目的を越えてこれを阻止することはない。このギャップを、選好最適化の統一された 'emph{incentive-score decomposition} を示し、多様な目的が同一の局所更新方向を共有し、スカラー重み付け係数だけが異なることを明らかにすることによって橋渡しする。この分解に基づいて、選択された/棄却された確率の力学を解析することにより、訓練が望まれる経路を具現化し、例えば初期過渡期以降の勝者を維持しながら敗者を抑制することで、遠方変位を回避できる場合に特徴付ける、単純で検証可能な条件である 'emph{disentanglement band} (DB) を同定する。 DBを応用し、基本目的を再設計することなく、DBを満たすために選択された更新と削除された更新のバランスを適応的に調整し、可能性のずれを軽減するプラグイン・アンド・プレイ \emph{reward calibration} (RC)を提案する。実験結果から, RCステアリングはより不整合なダイナミックスを指向し, 様々な目標に対して下流性能を向上させることが示唆された。私たちのコードはhttps://github.com/IceyWuu/DisentangledPreferenceOptimizationで利用可能です。

論文の概要: Towards Disentangled Preference Optimization Dynamics Beyond Likelihood Displacement

関連論文リスト