Fugu-MT 論文翻訳(概要): Inference-Time Alignment Control for Diffusion Models with Reinforcement Learning Guidance

論文の概要: Inference-Time Alignment Control for Diffusion Models with Reinforcement Learning Guidance

arxiv url: http://arxiv.org/abs/2508.21016v1
Date: Thu, 28 Aug 2025 17:18:31 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-29 18:12:02.531032
Title: Inference-Time Alignment Control for Diffusion Models with Reinforcement Learning Guidance
Title（参考訳）: 強化学習誘導を伴う拡散モデルの推論時間アライメント制御
Authors: Luozhijie Jin, Zijie Qiu, Jie Liu, Zijie Diao, Lifeng Qiao, Ning Ding, Alex Lamb, Xipeng Qiu,
Abstract要約: 本稿では,Dejin-Free Guidance(CFG)に適応する推論時間法であるReinforcement Learning Guidance(RLG)を紹介する。 RLGは、RLの細調整されたモデルの性能を、人間の好み、構成制御、圧縮、テキストレンダリングなど、様々なRLアルゴリズム、下流タスクで一貫して改善している。提案手法は,拡散モデルアライメント推論の強化と制御のための,実用的で理論的に健全な解を提供する。
参考スコア（独自算出の注目度）: 46.06527859746679
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Denoising-based generative models, particularly diffusion and flow matching algorithms, have achieved remarkable success. However, aligning their output distributions with complex downstream objectives, such as human preferences, compositional accuracy, or data compressibility, remains challenging. While reinforcement learning (RL) fine-tuning methods, inspired by advances in RL from human feedback (RLHF) for large language models, have been adapted to these generative frameworks, current RL approaches are suboptimal for diffusion models and offer limited flexibility in controlling alignment strength after fine-tuning. In this work, we reinterpret RL fine-tuning for diffusion models through the lens of stochastic differential equations and implicit reward conditioning. We introduce Reinforcement Learning Guidance (RLG), an inference-time method that adapts Classifier-Free Guidance (CFG) by combining the outputs of the base and RL fine-tuned models via a geometric average. Our theoretical analysis shows that RLG's guidance scale is mathematically equivalent to adjusting the KL-regularization coefficient in standard RL objectives, enabling dynamic control over the alignment-quality trade-off without further training. Extensive experiments demonstrate that RLG consistently improves the performance of RL fine-tuned models across various architectures, RL algorithms, and downstream tasks, including human preferences, compositional control, compressibility, and text rendering. Furthermore, RLG supports both interpolation and extrapolation, thereby offering unprecedented flexibility in controlling generative alignment. Our approach provides a practical and theoretically sound solution for enhancing and controlling diffusion model alignment at inference. The source code for RLG is publicly available at the Github: https://github.com/jinluo12345/Reinforcement-learning-guidance.
Abstract（参考訳）: 微分に基づく生成モデル、特に拡散とフローマッチングアルゴリズムは驚くべき成功を収めた。しかし、人間の好み、構成精度、データ圧縮性といった複雑な下流目標に出力分布を合わせることは依然として困難である。大規模言語モデルに対する人間フィードバック(RLHF)からのRLの進歩にインスパイアされた強化学習(RL)ファインチューニング手法は,これらの生成フレームワークに適用されているが,現在のRLアプローチは拡散モデルに最適であり,微調整後のアライメント強度の制御に限られた柔軟性を提供する。本研究では,確率微分方程式のレンズと暗黙の報酬条件による拡散モデルのRL微調整を再解釈する。本稿では,RLモデルとベースモデルの出力を幾何平均で組み合わせ,分類自由誘導(CFG)を適応させる推論時間法であるReinforcement Learning Guidance(RLG)を紹介する。理論解析により, RLGの誘導スケールは標準RL目標のKL正則化係数の調整と数学的に等価であることを示し, さらなるトレーニングを伴わずにアライメント品質のトレードオフを動的に制御できることを示した。大規模な実験により、RLGは様々なアーキテクチャ、RLアルゴリズム、人間の好み、構成制御、圧縮性、テキストレンダリングを含む下流タスクにおいて、RLの微調整モデルの性能を一貫して改善することを示した。さらに、RCGは補間と外挿の両方をサポートし、生成的アライメントを制御するのに前例のない柔軟性を提供する。提案手法は,推論時の拡散モデルアライメントの強化と制御を行うための,実用的で理論的に健全な解を提供する。 RLGのソースコードはGithubで公開されている。 https://github.com/jinluo12345/Reinforcement-learning-guidance。

論文の概要: Inference-Time Alignment Control for Diffusion Models with Reinforcement Learning Guidance

関連論文リスト