Fugu-MT 論文翻訳(概要): Fisher Decorator: Refining Flow Policy via A Local Transport Map

論文の概要: Fisher Decorator: Refining Flow Policy via A Local Transport Map

arxiv url: http://arxiv.org/abs/2604.17919v1
Date: Mon, 20 Apr 2026 07:54:36 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-21 21:52:52.754389
Title: Fisher Decorator: Refining Flow Policy via A Local Transport Map
Title（参考訳）: Fisher Decorator:ローカルトランスポートマップによるフローポリシーの精錬
Authors: Xiaoyuan Cheng, Haoyu Wang, Wenxuan Yuan, Ziyan Wang, Zonghao Chen, Li Zeng, Zhuo Sun,
Abstract要約: フローベースオフライン強化学習(RL)は,フローマッチングによるポリシのパラメータ化によって,高い性能を実現している。既存のフローポリシーは、2-ワッサーシュタイン距離(W$)の上限として$L$正規化を解釈する。行動ポリシー多様体は本質的に異方性を持ち、$L$正規化は等方性と密度非感性である。誘導密度変換を解析することにより、フィッシャー情報行列が支配するKL制約対象の局所2次近似を導出する。
参考スコア（独自算出の注目度）: 22.885775277923106
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances in flow-based offline reinforcement learning (RL) have achieved strong performance by parameterizing policies via flow matching. However, they still face critical trade-offs among expressiveness, optimality, and efficiency. In particular, existing flow policies interpret the $L_2$ regularization as an upper bound of the 2-Wasserstein distance ($W_2$), which can be problematic in offline settings. This issue stems from a fundamental geometric mismatch: the behavioral policy manifold is inherently anisotropic, whereas the $L_2$ (or upper bound of $W_2$) regularization is isotropic and density-insensitive, leading to systematically misaligned optimization directions. To address this, we revisit offline RL from a geometric perspective and show that policy refinement can be formulated as a local transport map: an initial flow policy augmented by a residual displacement. By analyzing the induced density transformation, we derive a local quadratic approximation of the KL-constrained objective governed by the Fisher information matrix, enabling a tractable anisotropic optimization formulation. By leveraging the score function embedded in the flow velocity, we obtain a corresponding quadratic constraint for efficient optimization. Our results reveal that the optimality gap in prior methods arises from their isotropic approximation. In contrast, our framework achieves a controllable approximation error within a provable neighborhood of the optimal solution. Extensive experiments demonstrate state-of-the-art performance across diverse offline RL benchmarks. See project page: https://github.com/ARC0127/Fisher-Decorator.
Abstract（参考訳）: フローベースオフライン強化学習(RL)の最近の進歩は,フローマッチングによるパラメータ化政策により,高い性能を達成している。しかし、表現力、最適性、効率性の間には、依然として重要なトレードオフに直面している。特に、既存のフローポリシーでは、$L_2$正規化を2-ワッサーシュタイン距離(W_2$)の上限として解釈している。行動ポリシー多様体は本質的に異方的であるのに対し、$L_2$(または$W_2$の上界)正則化は等方的で密度に敏感であり、体系的に不整合な最適化方向をもたらす。これを解決するために、幾何学的観点からオフラインRLを再検討し、ポリシーの洗練を局所輸送マップとして定式化できることを示し、残留変位によって拡張された初期フローポリシーを示す。誘導密度変換を解析することにより、フィッシャー情報行列が支配するKL制約対象の局所2次近似を導出し、トラクタブルな異方性最適化の定式化を可能にする。フロー速度に埋め込まれたスコア関数を利用することで,効率のよい2次制約を求める。以上の結果から,従来の手法の最適性ギャップは等方性近似から生じることが明らかとなった。対照的に,本フレームワークは最適解の証明可能な近傍で制御可能な近似誤差を実現する。大規模な実験では、さまざまなオフラインRLベンチマークで最先端のパフォーマンスが実証されている。プロジェクトページは、https://github.com/ARC0127/Fisher-Decoratorを参照。

論文の概要: Fisher Decorator: Refining Flow Policy via A Local Transport Map

関連論文リスト