Fugu-MT 論文翻訳(概要): Exploring the Design Space of Reward Backpropagation for Flow Matching

論文の概要: Exploring the Design Space of Reward Backpropagation for Flow Matching

arxiv url: http://arxiv.org/abs/2606.11075v1
Date: Tue, 09 Jun 2026 16:36:54 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-10 15:40:58.615335
Title: Exploring the Design Space of Reward Backpropagation for Flow Matching
Title（参考訳）: フローマッチングのための逆バックプロパゲーションの設計空間の探索
Authors: Ruoyu Wang, Boye Niu, Xiangxin Zhou, Yushi Huang, Tongliang Liu, Chi Zhang,
Abstract要約: FlowBPは、後方軌道自体をデザインオブジェクトとして扱う統一的なサロゲート・トラジェクトリフレームワークである。 FlowBP-Sparse、FlowBP-Bridge、FlowBP-Lagrangeの3つの変種をインスタンス化する。アクティブセットサイズと制限勾配連鎖による3つの有界メモリは、少なくとも1つのジャコビアン因子に連鎖する。
参考スコア（独自算出の注目度）: 47.80328464705813
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Aligning text-to-image flow matching models with human preferences via direct reward backpropagation is sample-efficient but hampered by two well-known pathologies: activations cannot be stored across the full sampling trajectory at modern model scale, and chained Jacobian products across steps inflate the reward gradient as it travels back to early indices. Connector-based methods, such as LeapAlign, address these issues by replacing the full backward trajectory with a short pinned path, highlighting a useful decoupling between sampling and optimization. However, the quality of the resulting gradient depends on how accurately this short path approximates the full rollout, especially over long intervals. We propose FlowBP, a unified surrogate-trajectory framework that treats the backward trajectory itself as the design object. FlowBP keeps a no-gradient cached rollout for sampling, then builds a lightweight backward surrogate from cached and selectively re-forwarded velocities. This view separates four choices: the reward-model input, active set, integration weights, and bridge coupling, and recovers prior direct-gradient methods as particular settings. Within this framework, we instantiate three variants: FlowBP-Sparse uses sparse Euler reconstruction, FlowBP-Bridge adds controlled bridge coupling, and FlowBP-Lagrange raises the order of leap quadrature. All three bound memory by the active-set size and limit gradient chaining to at most one Jacobian factor. Across SD3.5-M, FLUX.1-dev, and FLUX.2-Klein-base on preference, quality, and compositional metrics, the three variants improve over direct-gradient baselines on most metrics.
Abstract（参考訳）: 直接報酬バックプロパゲーションによる人間の好みによるテキスト間フローマッチングモデルの調整は、サンプル効率が良いが、2つのよく知られた病理によって妨げられる: アクティベーションは、現代のモデルスケールで完全なサンプリング軌道にわたって保存できない。 LeapAlignのようなコネクタベースの手法は、完全な後方軌道を短いピン付きパスに置き換えることでこれらの問題に対処し、サンプリングと最適化の間の有用な分離を強調している。しかし、結果として生じる勾配の質は、この短い経路がフルロールアウト、特に長い間隔でどれだけ正確に近似するかに依存する。本稿では,後進軌道自体を設計対象として扱う一貫した代理軌道フレームワークであるFlowBPを提案する。 FlowBPは、サンプリングのために、段階的にキャッシュされたロールアウトを保持し、キャッシュされ、選択的に再フォワードされた速度から、軽量な後方サロゲートを構築する。このビューでは、報酬モデル入力、アクティブセット、統合ウェイト、ブリッジ結合の4つの選択を分離し、特定の設定として、事前の直接勾配メソッドを復元する。 FlowBP-SparseはスパースEuler再構成を使用し、FlowBP-Bridgeは制御されたブリッジ結合を追加し、FlowBP-Lagrangeは跳躍4次法の順序を上げる。アクティブセットサイズと制限勾配連鎖による3つの有界メモリは、少なくとも1つのジャコビアン因子に連鎖する。 SD3.5-M, FLUX.1-dev, FLUX.2-Klein-baseの3つの変種は、好み、品質、構成の指標において、ほとんどの指標において、直進的なベースラインよりも改善されている。

論文の概要: Exploring the Design Space of Reward Backpropagation for Flow Matching

関連論文リスト