Fugu-MT 論文翻訳(概要): No Alignment Needed for Generation: Learning Linearly Separable Representations in Diffusion Models

論文の概要: No Alignment Needed for Generation: Learning Linearly Separable Representations in Diffusion Models

arxiv url: http://arxiv.org/abs/2509.21565v1
Date: Thu, 25 Sep 2025 20:46:48 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-29 20:57:54.001865
Title: No Alignment Needed for Generation: Learning Linearly Separable Representations in Diffusion Models
Title（参考訳）: 生成に不要なアライメント:拡散モデルにおける線形分離表現の学習
Authors: Junno Yun, Yaşar Utku Alçalar, Mehmet Akçakaya,
Abstract要約: 本稿では,中間層表現の線形SEP (Linear SEParability) の促進に基づく,学習のための代替正規化を提案する。本結果は,フローベーストランスアーキテクチャにおけるトレーニング効率と生成品質の両面で大幅に向上したことを示す。
参考スコア（独自算出の注目度）: 4.511561231517167
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Efficient training strategies for large-scale diffusion models have recently emphasized the importance of improving discriminative feature representations in these models. A central line of work in this direction is representation alignment with features obtained from powerful external encoders, which improves the representation quality as assessed through linear probing. Alignment-based approaches show promise but depend on large pretrained encoders, which are computationally expensive to obtain. In this work, we propose an alternative regularization for training, based on promoting the Linear SEParability (LSEP) of intermediate layer representations. LSEP eliminates the need for an auxiliary encoder and representation alignment, while incorporating linear probing directly into the network's learning dynamics rather than treating it as a simple post-hoc evaluation tool. Our results demonstrate substantial improvements in both training efficiency and generation quality on flow-based transformer architectures such as SiTs, achieving an FID of 1.46 on $256 \times 256$ ImageNet dataset.
Abstract（参考訳）: 大規模拡散モデルの効率的な訓練戦略は、これらのモデルにおける識別的特徴表現を改善することの重要性を強調している。この方向の作業の中心は、強力な外部エンコーダから得られる特徴との表現アライメントであり、線形探索により評価された表現品質を改善する。アライメントベースのアプローチは、将来性を示すが、計算コストがかかる大規模な事前訓練エンコーダに依存している。本研究では,中間層表現の線形SEP(Linear SEParability)の促進に基づく,学習のための代替正規化を提案する。 LSEPは、単純なポストホック評価ツールとして扱うのではなく、線形探索を直接ネットワークの学習力学に組み込むとともに、補助エンコーダと表現アライメントの必要性を排除している。その結果,SiTsなどのフローベーストランスフォーマーアーキテクチャのトレーニング効率と生成品質が大幅に向上し,256$ImageNetデータセットで256ドルに対してFIDが1.46ドルに達した。

論文の概要: No Alignment Needed for Generation: Learning Linearly Separable Representations in Diffusion Models

関連論文リスト