Fugu-MT 論文翻訳(概要): Preference is More Than Comparisons: Rethinking Dueling Bandits with Augmented Human Feedback

論文の概要: Preference is More Than Comparisons: Rethinking Dueling Bandits with Augmented Human Feedback

arxiv url: http://arxiv.org/abs/2511.09047v1
Date: Thu, 13 Nov 2025 01:28:17 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-13 22:34:54.378176
Title: Preference is More Than Comparisons: Rethinking Dueling Bandits with Augmented Human Feedback
Title（参考訳）: 評価は比較以上のもの: デュエルバンドと人的フィードバックを再考する
Authors: Shengbo Wang, Hong Sun, Ke Li,
Abstract要約: 対話的嗜好関係(Interactive preference elicitation、IPE)は、広範囲なパーソナライゼーションシステムにおいて、人間の嗜好を取得しながら、人間の努力を大幅に削減することを目的としている。ダリングバンディット(DB)アルゴリズムは、ペア比較に基づいてIPE構築における最適な意思決定を可能にする。フィードバック強化に基づく別の視点を導入し、モデルフリーDBフレームワークに重要な改善を導入する。提案アルゴリズムは,大規模言語モデルに対する推薦,多目的最適化,応答最適化など,複数のIPEベンチマーク間での競合性能を実現する。
参考スコア（独自算出の注目度）: 17.459431876117176
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Interactive preference elicitation (IPE) aims to substantially reduce human effort while acquiring human preferences in wide personalization systems. Dueling bandit (DB) algorithms enable optimal decision-making in IPE building on pairwise comparisons. However, they remain inefficient when human feedback is sparse. Existing methods address sparsity by heavily relying on parametric reward models, whose rigid assumptions are vulnerable to misspecification. In contrast, we explore an alternative perspective based on feedback augmentation, and introduce critical improvements to the model-free DB framework. Specifically, we introduce augmented confidence bounds to integrate augmented human feedback under generalized concentration properties, and analyze the multi-factored performance trade-off via regret analysis. Our prototype algorithm achieves competitive performance across several IPE benchmarks, including recommendation, multi-objective optimization, and response optimization for large language models, demonstrating the potential of our approach for provably efficient IPE in broader applications.
Abstract（参考訳）: 対話的嗜好関係(Interactive preference elicitation、IPE)は、広範囲なパーソナライゼーションシステムにおいて、人間の嗜好を取得しながら、人間の努力を大幅に削減することを目的としている。ダリングバンディット(DB)アルゴリズムは、ペア比較に基づいてIPE構築における最適な意思決定を可能にする。しかしながら、人間のフィードバックが不足している場合、それらは非効率なままである。既存の手法は、厳密な仮定が不特定性に弱いパラメトリック報酬モデルに強く依存することで、空間性に対処する。対照的に、フィードバック強化に基づく代替的な視点を探求し、モデルフリーDBフレームワークに重要な改善を導入する。具体的には、一般濃度特性下での強化された人間のフィードバックを統合するための強化された信頼境界を導入し、後悔解析により多要素性能トレードオフを解析する。提案アルゴリズムは,大規模言語モデルに対する推薦,多目的最適化,応答最適化など,複数のIPEベンチマーク間での競合性能を実現する。

論文の概要: Preference is More Than Comparisons: Rethinking Dueling Bandits with Augmented Human Feedback

関連論文リスト