Fugu-MT 論文翻訳(概要): AutoRubric-T2I: Robust Rule-Based Reward Model for Text-to-Image Alignment

論文の概要: AutoRubric-T2I: Robust Rule-Based Reward Model for Text-to-Image Alignment

arxiv url: http://arxiv.org/abs/2605.17602v1
Date: Sun, 17 May 2026 19:00:44 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-19 17:57:48.225801
Title: AutoRubric-T2I: Robust Rule-Based Reward Model for Text-to-Image Alignment
Title（参考訳）: AutoRubric-T2I:テキストと画像のアライメントのためのロバストルールに基づくリワードモデル
Authors: Kuei-Chun Kao, Daixuan Huo, Yuanhao Ban, Cho-Jui Hsieh,
Abstract要約: AutoRubric-T2Iは、VLM審査員を導くための明示的なルーブリックを自動的に合成し、選択する最初のルーブリック学習フレームワークである。本稿では,AutoRubric-T2Iがアノテートされた嗜好データの0.01%以下を用いて,高品質で解釈可能な報奨信号を生成することを示す。
参考スコア（独自算出の注目度）: 44.851672394450105
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Aligning Text-to-Image (T2I) generation models with human preferences increasingly relies on image reward models that score or rank generated images according to prompt alignment and perceptual quality. Existing reward models are commonly trained as Bradley-Terry (BT) preference models on large-scale human preference corpora, making them costly to train, difficult to adapt, and opaque in their evaluation criteria. Meanwhile, Vision-Language Model (VLM) judges can provide more fine-grained assessments through textual rubrics, but their manually designed or heuristically generated scoring rules may fail to reliably reflect human preferences. In this paper, we propose AutoRubric-T2I, the first rubric learning framework in T2I that automatically synthesizes and selects explicit rubrics for guiding VLM judges. AutoRubric-T2I first synthesizes reasoning traces from preference pairs into candidate rubrics, then uses a VLM judge to score paired images under each rubric, producing pairwise rubric-score differences for preference learning. To remove noisy and redundant rules, we further employ a $\ell_1$-Regularized Logistic Regression Refiner, which selects the Top-$N$ most discriminative rubrics. Extensive evaluations show that AutoRubric-T2I produces high-quality, interpretable reward signals using less than 0.01% of the annotated preference data, substantially reducing the need for large-scale reward-model training. On image reward benchmarks such as MMRB2, AutoRubric-T2I outperforms strong reward model baselines. We further validate AutoRubric-T2I as an RL reward on downstream T2I tasks, including TIIF and UniGenBench++, where it improves generation quality over scalar reward models using the Flow-GRPO pipeline on diffusion models.
Abstract（参考訳）: 人間の好みを持つテキスト・ツー・イメージ(T2I)生成モデルの調整は、素早いアライメントと知覚品質に応じて生成された画像のスコアやランク付けを行うイメージ報酬モデルにますます依存している。既存の報酬モデルは通常、大規模な人間の嗜好コーパスにおけるBradley-Terry(BT)選好モデルとして訓練されており、訓練にコストがかかり、適応が難しく、評価基準に不透明である。一方、VLM(Vision-Language Model)の審査員は、テキストのルーブリックを通じてよりきめ細かい評価を行うことができるが、手動で設計またはヒューリスティックに生成されたスコアリングルールは、人間の好みを確実に反映できない可能性がある。本稿では,VLM審査員を導くための明示的なルーブリックを自動的に合成し,選択する,T2Iにおける最初のルーブリック学習フレームワークであるAutoRubric-T2Iを提案する。 AutoRubric-T2Iはまず、好みのペアから候補のルーブリックへの推論トレースを合成し、次にVLMの判定器を使用して各ルーブリックの下でペアの画像をスコアし、ペアのルーブリックスコアの違いを優先学習に生み出す。ノイズと冗長なルールを削除するために、さらに$\ell_1$-regularized Logistic Regression Refinerを使用します。広範囲な評価の結果,AutoRubric-T2Iは注釈付き嗜好データの0.01%未満で高品質で解釈可能な報奨信号を生成し,大規模な報奨モデルトレーニングの必要性を大幅に低減した。 MMRB2のような画像報酬ベンチマークでは、AutoRubric-T2Iは強力な報酬モデルベースラインを上回っている。さらに, TIIFやUniGenBench++など, 下流T2IタスクにおけるRL報酬としてAutoRubric-T2Iを検証し, 拡散モデル上でのFlow-GRPOパイプラインを用いたスカラー報酬モデルよりも生成品質を向上させる。

論文の概要: AutoRubric-T2I: Robust Rule-Based Reward Model for Text-to-Image Alignment

関連論文リスト