Fugu-MT 論文翻訳(概要): Follow-Your-Preference++: Rethinking Preference Alignment for Image Inpainting

論文の概要: Follow-Your-Preference++: Rethinking Preference Alignment for Image Inpainting

arxiv url: http://arxiv.org/abs/2606.03216v1
Date: Tue, 02 Jun 2026 06:22:07 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-03 22:00:04.803757
Title: Follow-Your-Preference++: Rethinking Preference Alignment for Image Inpainting
Title（参考訳）: Follow-Your-Preference++: イメージインパインティングの優先度アライメントを再考する
Authors: Junkun Yuan, Yutao Shen, Toru Aonishi, Hideki Nakayama, Yue Ma,
Abstract要約: 我々は、広く使われている直接選好最適化フレームワークを採用し、一般公開された報奨モデルを用いた選好学習データを構築した。報酬モデルの単純なアンサンブルはそのようなバイアスを緩和し、堅牢で一般化可能な性能をもたらす。我々のモデルは、標準メトリクス、大規模視覚言語モデル評価、人的評価において、最先端のモデルよりも大幅に優れています。
参考スコア（独自算出の注目度）: 17.648992293002088
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study preference alignment for image inpainting. Rather than proposing yet another method, we revisit the problem from first principles and reassess its core challenges. We adopt the widely used direct preference optimization framework and construct preference training data with publicly available reward models. Our empirical study spans nine reward models, two benchmarks, and two baseline inpainting models that differ in architecture and generative mechanism. Our main findings are: (1) Most reward models provide valid signals for preference data construction, although some are unreliable as evaluators. (2) Across models and benchmarks, preference data exhibits consistent trends under both candidate and sample scaling. (3) Reward models display pronounced biases--particularly in brightness, composition, and color scheme--that make them prone to inducing reward hacking. (4) A simple ensemble of reward models mitigates such biases and yields robust, generalizable performance. {\color{rebuttal_blue}(5) Preference alignment is transferable to the object removal task, where the goal shifts from open-ended creative generation to coherent background completion. (6) Further analysis reveals that a calibrated ensemble method further mitigates hacking and improves robustness.} Without modifying model architectures or introducing additional datasets, our models substantially outperform prior state-of-the-art models on standard metrics, large vision-language model evaluations, and human assessments. Our code is available at: https://github.com/shenytzzz/Follow-Your-Preference.
Abstract（参考訳）: 画像インペイントのための好みのアライメントについて検討する。新たな方法を提案するのではなく、最初の原則から問題を再検討し、その中核的な課題を再評価します。我々は、広く使われている直接選好最適化フレームワークを採用し、一般公開された報奨モデルを用いた選好学習データを構築した。我々の実証的研究は、アーキテクチャと生成機構が異なる9つの報酬モデル、2つのベンチマーク、2つのベースラインインペイントモデルにまたがる。主な知見は,(1)報奨モデルが優先データ構築に有効な信号を提供するが,評価対象として信頼性が低いものもある。 2) モデルおよびベンチマーク全体において, 選好データは候補とサンプルのスケーリングの両方において一貫した傾向を示す。 (3)リワードモデルでは、特に明るさ、構成、色調の偏りが顕著であり、報酬ハッキングを誘発する傾向がある。 (4) 報酬モデルの単純なアンサンブルは、そのようなバイアスを緩和し、堅牢で一般化可能な性能をもたらす。参照アライメントはオブジェクト削除タスクに転送可能で、そこでは、目標がオープンエンドのクリエイティブ生成からコヒーレントなバックグラウンドコンプリートにシフトする。 (6) さらに解析した結果, 校正アンサンブル法によりハッキングが軽減され, 堅牢性が向上することが明らかとなった。モデルアーキテクチャを変更したり、追加データセットを導入することなく、私たちのモデルは標準メトリクス、大きな視覚言語モデル評価、人間の評価において、最先端のモデルを大幅に上回っています。私たちのコードは、https://github.com/shenytzzz/Follow-Your-Preference.comで利用可能です。

論文の概要: Follow-Your-Preference++: Rethinking Preference Alignment for Image Inpainting

関連論文リスト