Fugu-MT 論文翻訳(概要): Guiding a Diffusion Model by Swapping Its Tokens

論文の概要: Guiding a Diffusion Model by Swapping Its Tokens

arxiv url: http://arxiv.org/abs/2604.08048v1
Date: Thu, 09 Apr 2026 09:54:49 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-10 18:34:05.85327
Title: Guiding a Diffusion Model by Swapping Its Tokens
Title（参考訳）: トケインをスワップすることで拡散モデルを導く
Authors: Weijia Zhang, Yuehao Liu, Shanyan Guan, Wu Ran, Yanhao Ge, Wei Li, Chao Ma,
Abstract要約: 本研究では,条件生成と非条件生成の両方に対して,CFGライクなガイダンスを実現するための簡単な手法を提案する。鍵となるアイデアは、単純なトークンスワップ操作によって混乱した予測を生成することである。提案手法はトークン潜伏剤を選択的に交換し,分解し,摂動の制御をきめ細かなものにする。
参考スコア（独自算出の注目度）: 16.588428780117752
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Classifier-Free Guidance (CFG) is a widely used inference-time technique to boost the image quality of diffusion models. Yet, its reliance on text conditions prevents its use in unconditional generation. We propose a simple method to enable CFG-like guidance for both conditional and unconditional generation. The key idea is to generate a perturbed prediction via simple token swap operations, and use the direction between it and the clean prediction to steer sampling towards higher-fidelity distributions. In practice, we swap pairs of most semantically dissimilar token latents in either spatial or channel dimensions. Unlike existing methods that apply perturbation in a global or less constrained manner, our approach selectively exchanges and recomposes token latents, allowing finer control over perturbation and its influence on generated samples. Experiments on MS-COCO 2014, MS-COCO 2017, and ImageNet datasets demonstrate that the proposed Self-Swap Guidance (SSG), when applied to popular diffusion models, outperforms previous condition-free methods in image fidelity and prompt alignment under different set-ups. Its fine-grained perturbation granularity also improves robustness, reducing side-effects across a wider range of perturbation strengths. Overall, SSG extends CFG to a broader scope of applications including both conditional and unconditional generation, and can be readily inserted into any diffusion model as a plug-in to gain immediate improvements.
Abstract（参考訳）: Classifier-Free Guidance (CFG) は拡散モデルの画質を高めるために広く使われている推論時間技術である。しかし、テキスト条件に依存しているため、非条件生成では使用できない。本研究では,条件生成と非条件生成の両方に対して,CFGライクなガイダンスを実現するための簡単な手法を提案する。鍵となるアイデアは、単純なトークンスワップ操作によって摂動予測を生成し、それとクリーンな予測の間の方向を使って、サンプリングを高忠実度分布に向けて操ることである。実際には、最も意味的に異なるトークンラテントのペアを空間次元またはチャネル次元で交換する。摂動を大域的あるいは少なからぬ制約で適用する既存の方法とは異なり、我々の手法はトークン潜伏剤を選択的に交換し、分解し、摂動のより細かい制御を可能にする。 MS-COCO 2014、MS-COCO 2017、ImageNetデータセットによる実験では、一般的な拡散モデルに適用された提案されたセルフスワップガイダンス(SSG)が、画像の忠実性において以前の条件のない手法より優れ、異なるセットアップの下で即時アライメントが優れていることが示されている。その微細な摂動の粒度は、堅牢性も向上し、幅広い摂動強度にわたる副作用を減少させる。全体として、SSGはCFGを条件生成と非条件生成の両方を含む広い範囲のアプリケーションに拡張し、プラグインとして任意の拡散モデルに簡単に挿入して即時改善することができる。

論文の概要: Guiding a Diffusion Model by Swapping Its Tokens

関連論文リスト