Fugu-MT 論文翻訳(概要): ToddlerDiffusion: Interactive Structured Image Generation with Cascaded Schrödinger Bridge

論文の概要: ToddlerDiffusion: Interactive Structured Image Generation with Cascaded Schrödinger Bridge

arxiv url: http://arxiv.org/abs/2311.14542v2
Date: Sat, 05 Oct 2024 15:16:25 GMT
ステータス: 翻訳完了
システム内更新日: 2024-12-05 03:13:10.873205
Title: ToddlerDiffusion: Interactive Structured Image Generation with Cascaded Schrödinger Bridge
Title（参考訳）: Toddler拡散:カスケード型シュレーディンガー橋を用いた対話型構造化画像生成
Authors: Eslam Abdelrahman, Liangbing Zhao, Vincent Tao Hu, Matthieu Cord, Patrick Perez, Mohamed Elhoseiny,
Abstract要約: ToddlerDiffusionは、RGB画像生成の複雑なタスクを、よりシンプルで解釈可能なステージに分解するための新しいアプローチである。提案手法はToddler Diffusionと呼ばれ,それぞれが中間表現を生成する責務を担っている。 ToddlerDiffusionは、常に最先端のメソッドより優れています。
参考スコア（独自算出の注目度）: 63.00793292863
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Diffusion models break down the challenging task of generating data from high-dimensional distributions into a series of easier denoising steps. Inspired by this paradigm, we propose a novel approach that extends the diffusion framework into modality space, decomposing the complex task of RGB image generation into simpler, interpretable stages. Our method, termed ToddlerDiffusion, cascades modality-specific models, each responsible for generating an intermediate representation, such as contours, palettes, and detailed textures, ultimately culminating in a high-quality RGB image. Instead of relying on the naive LDM concatenation conditioning mechanism to connect the different stages together, we employ Schr\"odinger Bridge to determine the optimal transport between different modalities. Although employing a cascaded pipeline introduces more stages, which could lead to a more complex architecture, each stage is meticulously formulated for efficiency and accuracy, surpassing Stable-Diffusion (LDM) performance. Modality composition not only enhances overall performance but enables emerging proprieties such as consistent editing, interaction capabilities, high-level interpretability, and faster convergence and sampling rate. Extensive experiments on diverse datasets, including LSUN-Churches, ImageNet, CelebHQ, and LAION-Art, demonstrate the efficacy of our approach, consistently outperforming state-of-the-art methods. For instance, ToddlerDiffusion achieves notable efficiency, matching LDM performance on LSUN-Churches while operating 2$\times$ faster with a 3$\times$ smaller architecture. The project website is available at: https://toddlerdiffusion.github.io/website/
Abstract（参考訳）: 拡散モデルは、高次元分布からデータを生成するという困難なタスクを、より簡単な段階に分割する。このパラダイムに着想を得て,RGB画像生成の複雑なタスクをシンプルかつ解釈可能な段階に分解し,拡散フレームワークをモダリティ空間に拡張する手法を提案する。提案手法はToddler Diffusionと呼ばれ,それぞれが輪郭,パレット,詳細なテクスチャなどの中間表現を生成し,最終的に高品質なRGB画像に終止符を打つ。異なる段階を繋ぐために、単純 LDM 結合条件機構に頼る代わりに、異なるモード間の最適な輸送を決定するためにSchr\"odinger Bridge を用いる。カスケードパイプラインを採用すると、より多くのステージが導入され、より複雑なアーキテクチャに繋がる可能性があるが、各ステージは効率と正確性のために慎重に定式化され、安定拡散(LDM)性能を上回っている。モダリティ構成は、全体的なパフォーマンスを高めるだけでなく、一貫した編集、相互作用能力、ハイレベルな解釈可能性、より高速な収束とサンプリング率などの新しいプロパティを可能にする。 LSUN-Churches、ImageNet、CelebHQ、LAION-Artなど、多様なデータセットに関する大規模な実験は、我々のアプローチの有効性を実証し、一貫して最先端の手法よりも優れています。例えば、ToddlerDiffusionはLSUN-Churches上で2$\times$を、より小さなアーキテクチャで3$\times$を高速に運用しながら、LCMパフォーマンスにマッチする、顕著な効率を実現している。プロジェクトのWebサイトは以下の通りである。

論文の概要: ToddlerDiffusion: Interactive Structured Image Generation with Cascaded Schrödinger Bridge

関連論文リスト