Fugu-MT 論文翻訳(概要): Guiding a Diffusion Transformer with the Internal Dynamics of Itself

論文の概要: Guiding a Diffusion Transformer with the Internal Dynamics of Itself

arxiv url: http://arxiv.org/abs/2512.24176v1
Date: Tue, 30 Dec 2025 12:16:46 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-01 23:27:28.377096
Title: Guiding a Diffusion Transformer with the Internal Dynamics of Itself
Title（参考訳）: 内部ダイナミクスを用いた拡散変圧器の誘導
Authors: Xingyu Zhou, Qifan Li, Xiaobin Hu, Hai Chen, Shuhang Gu,
Abstract要約: 内部ガイダンス(IG)は、各種ベースラインにおけるトレーニング効率と生成品質の両方に大きな改善をもたらす。 ImageNet 256x256では、SiT-XL/2+IGはFID=5.31とFID=1.75を80と800のエポックで達成している。 LightningDiT-XL/1+IGでは、LightningDiT-XL/1+IGはFID=1.34を達成する。
参考スコア（独自算出の注目度）: 29.825583753955485
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The diffusion model presents a powerful ability to capture the entire (conditional) data distribution. However, due to the lack of sufficient training and data to learn to cover low-probability areas, the model will be penalized for failing to generate high-quality images corresponding to these areas. To achieve better generation quality, guidance strategies such as classifier free guidance (CFG) can guide the samples to the high-probability areas during the sampling stage. However, the standard CFG often leads to over-simplified or distorted samples. On the other hand, the alternative line of guiding diffusion model with its bad version is limited by carefully designed degradation strategies, extra training and additional sampling steps. In this paper, we proposed a simple yet effective strategy Internal Guidance (IG), which introduces an auxiliary supervision on the intermediate layer during training process and extrapolates the intermediate and deep layer's outputs to obtain generative results during sampling process. This simple strategy yields significant improvements in both training efficiency and generation quality on various baselines. On ImageNet 256x256, SiT-XL/2+IG achieves FID=5.31 and FID=1.75 at 80 and 800 epochs. More impressively, LightningDiT-XL/1+IG achieves FID=1.34 which achieves a large margin between all of these methods. Combined with CFG, LightningDiT-XL/1+IG achieves the current state-of-the-art FID of 1.19.
Abstract（参考訳）: 拡散モデルは、(条件付き)データ分布全体をキャプチャする強力な能力を示す。しかし、低確率領域をカバーするための十分なトレーニングとデータが不足しているため、これらの領域に対応する高品質な画像の生成に失敗するため、モデルがペナルティ化される。より優れた生成品質を達成するために、分類器フリーガイダンス(CFG)のようなガイダンス戦略は、サンプリング段階の高確率領域にサンプルを誘導することができる。しかし、標準のCFGはしばしば過剰に単純化されたり歪んだりする。一方, 導出拡散モデル(拡散モデル, 導出拡散モデル, 導出拡散モデル, 導出拡散モデル, 導出拡散モデル, 導出拡散モデル, 導出拡散モデル, 導出拡散モデル, 導出拡散モデル, 導出拡散モデル, 導出拡散モデル, 導出拡散モデル, 導出拡散モデル, 導出拡散モデル, 導出拡散モデル, 導出拡散モデル)の本稿では,トレーニングプロセス中に中間層を補助的に監視し,中間層および深層出力を外挿してサンプリングプロセス中に生成結果を得る,シンプルで効果的な内部ガイダンス(IG)を提案する。この単純な戦略は、様々なベースラインでのトレーニング効率と生成品質の両方に大きな改善をもたらす。 ImageNet 256x256では、SiT-XL/2+IGはFID=5.31とFID=1.75を80と800のエポックで達成している。さらに印象的なことに、LightningDiT-XL/1+IGはFID=1.34を達成した。 CFGと組み合わせて、LightningDiT-XL/1+IGは1.19の最先端FIDを実現する。

論文の概要: Guiding a Diffusion Transformer with the Internal Dynamics of Itself

関連論文リスト