Fugu-MT 論文翻訳(概要): A-IPO: Adaptive Intent-driven Preference Optimization

論文の概要: A-IPO: Adaptive Intent-driven Preference Optimization

arxiv url: http://arxiv.org/abs/2510.10077v1
Date: Sat, 11 Oct 2025 07:29:11 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-14 18:06:29.769564
Title: A-IPO: Adaptive Intent-driven Preference Optimization
Title（参考訳）: A-IPO:Adaptive Intent-Driven Preference Optimization
Authors: Wenqing Wang, Muhammad Asif Ali, Ali Shoker, Ruohan Yang, Junyang Chen, Ying Sha, Huan Wang,
Abstract要約: underlinetextbfAdaptive textbfunderlineIntent-driven textbfunderlinePreference textbfunderlineOptimization (textbfA-IPO)を紹介する。 A-IPOは、各ユーザプロンプトの背後にある潜在意図を推論するインテントモジュールを導入し、この推論意図を報酬関数に明示的に組み込む。
参考スコア（独自算出の注目度）: 14.221471110333828
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Human preferences are diverse and dynamic, shaped by regional, cultural, and social factors. Existing alignment methods like Direct Preference Optimization (DPO) and its variants often default to majority views, overlooking minority opinions and failing to capture latent user intentions in prompts. To address these limitations, we introduce \underline{\textbf{A}}daptive \textbf{\underline{I}}ntent-driven \textbf{\underline{P}}reference \textbf{\underline{O}}ptimization (\textbf{A-IPO}). Specifically,A-IPO introduces an intention module that infers the latent intent behind each user prompt and explicitly incorporates this inferred intent into the reward function, encouraging stronger alignment between the preferred model's responses and the user's underlying intentions. We demonstrate, both theoretically and empirically, that incorporating an intention--response similarity term increases the preference margin (by a positive shift of $\lambda\,\Delta\mathrm{sim}$ in the log-odds), resulting in clearer separation between preferred and dispreferred responses compared to DPO. For evaluation, we introduce two new benchmarks, Real-pref, Attack-pref along with an extended version of an existing dataset, GlobalOpinionQA-Ext, to assess real-world and adversarial preference alignment. Through explicit modeling of diverse user intents,A-IPO facilitates pluralistic preference optimization while simultaneously enhancing adversarial robustness in preference alignment. Comprehensive empirical evaluation demonstrates that A-IPO consistently surpasses existing baselines, yielding substantial improvements across key metrics: up to +24.8 win-rate and +45.6 Response-Intention Consistency on Real-pref; up to +38.6 Response Similarity and +52.2 Defense Success Rate on Attack-pref; and up to +54.6 Intention Consistency Score on GlobalOpinionQA-Ext.
Abstract（参考訳）: 人間の嗜好は多様で動的であり、地域、文化、社会的要因によって形作られた。 DPO(Direct Preference Optimization)のような既存のアライメントメソッドと、その変種は、少数派の意見を見落とし、プロンプトで潜むユーザの意図を捉えていないため、多くの場合、多数意見に対してデフォルトになっている。これらの制限に対処するために、 \underline{\textbf{A}}daptive \textbf{\underline{I}}ntent-driven \textbf{\underline{P}}reference \textbf{\underline{O}}ptimization (\textbf{A-IPO})を紹介します。具体的には、A-IPOは、各ユーザのプロンプトの背後にある潜在意図を推論し、この推論意図を報酬関数に明示的に組み込むインテントモジュールを導入し、好みのモデルの応答とユーザの基盤となるインテントとのより強力なアライメントを促進する。理論的にも経験的にも、意図-応答的類似性項を組み込むことで(対数で$\lambda\,\Delta\mathrm{sim}$の正のシフトによって)嗜好のマージンが増加し、DPOと比較して好ましくない応答と好ましくない応答の分離がより明確になることを示す。評価には、Real-pref、Attack-prefという2つの新しいベンチマークと、既存のデータセットの拡張バージョンであるGlobalOpinionQA-Extを導入し、現実と逆の優先順位アライメントを評価する。多様なユーザ意図の明示的なモデリングを通じて、A-IPOは多元的嗜好最適化を促進しながら、嗜好アライメントにおける敵の堅牢性を同時に強化する。総合的な実証的評価は、A-IPOが既存のベースラインを一貫して上回り、主要な指標である+24.8のウィンレートと+45.6のレスポンス・インテンション・一貫性、+38.6のレスポンス・類似性、+52.2の防衛成功率、+54.6のインテンション・コンシスタンス・スコアをGlobalOpinionQA-Extに拡大することを示している。

論文の概要: A-IPO: Adaptive Intent-driven Preference Optimization

関連論文リスト