Fugu-MT 論文翻訳(概要): Diffusion Blend: Inference-Time Multi-Preference Alignment for Diffusion Models

論文の概要: Diffusion Blend: Inference-Time Multi-Preference Alignment for Diffusion Models

arxiv url: http://arxiv.org/abs/2505.18547v1
Date: Sat, 24 May 2025 06:27:55 GMT
ステータス: 翻訳完了
システム内更新日: 2025-05-27 16:58:42.498559
Title: Diffusion Blend: Inference-Time Multi-Preference Alignment for Diffusion Models
Title（参考訳）: 拡散ブレンド:拡散モデルのための推論時間多重参照アライメント
Authors: Min Cheng, Fatemeh Doudi, Dileep Kalathil, Mohammad Ghavamzadeh, Panganamala R. Kumar,
Abstract要約: 拡散ブレンド(Diffusion Blend)は、推論時多重参照アライメントを解決する新しい手法である。このアプローチを、マルチリワードアライメントのためのDB-MPAと、KL正規化制御のためのDB-KLAの2つのアルゴリズムでインスタンス化する。
参考スコア（独自算出の注目度）: 25.59542599768357
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reinforcement learning (RL) algorithms have been used recently to align diffusion models with downstream objectives such as aesthetic quality and text-image consistency by fine-tuning them to maximize a single reward function under a fixed KL regularization. However, this approach is inherently restrictive in practice, where alignment must balance multiple, often conflicting objectives. Moreover, user preferences vary across prompts, individuals, and deployment contexts, with varying tolerances for deviation from a pre-trained base model. We address the problem of inference-time multi-preference alignment: given a set of basis reward functions and a reference KL regularization strength, can we design a fine-tuning procedure so that, at inference time, it can generate images aligned with any user-specified linear combination of rewards and regularization, without requiring additional fine-tuning? We propose Diffusion Blend, a novel approach to solve inference-time multi-preference alignment by blending backward diffusion processes associated with fine-tuned models, and we instantiate this approach with two algorithms: DB-MPA for multi-reward alignment and DB-KLA for KL regularization control. Extensive experiments show that Diffusion Blend algorithms consistently outperform relevant baselines and closely match or exceed the performance of individually fine-tuned models, enabling efficient, user-driven alignment at inference-time. The code is available at https://github.com/bluewoods127/DB-2025}{github.com/bluewoods127/DB-2025.
Abstract（参考訳）: Reinforcement Learning (RL)アルゴリズムは、最近、特定のKL正規化の下で1つの報酬関数を最大化するために、美的品質やテキスト画像の整合性などの下流目標と拡散モデルを整合させるために使われている。しかしながら、このアプローチは本質的に制限的であり、アライメントは複数の、しばしば矛盾する目標のバランスをとる必要がある。さらに、ユーザの好みはプロンプト、個人、デプロイメントのコンテキストによって異なり、事前訓練されたベースモデルからの逸脱に対する耐性は様々である。基本報酬関数のセットと基準KL正規化強度を与えられた場合、推論時に、追加の微調整を必要とせず、ユーザが指定した報酬と正規化の線形結合に整合した画像を生成することができるような微調整手順を設計できるだろうか? Diffusion Blendは、微調整モデルに付随する後方拡散過程をブレンドすることで、推論時マルチ参照アライメントを解決する新しい手法であり、マルチリワードアライメントのためのDB-MPAとKL正規化制御のためのDB-KLAの2つのアルゴリズムでこの手法をインスタンス化する。拡張実験により、拡散ブレンドアルゴリズムは関連するベースラインを一貫して上回り、個々の微調整されたモデルの性能と密に一致し、推論時の効率的なユーザ主導アライメントを可能にすることを示した。コードはhttps://github.com/bluewoods127/DB-2025}{github.com/bluewoods127/DB-2025で入手できる。

論文の概要: Diffusion Blend: Inference-Time Multi-Preference Alignment for Diffusion Models

関連論文リスト