Fugu-MT 論文翻訳(概要): Everything at Every Scale: Scale-Invariant Diffusion with Continuous Super-Resolution

論文の概要: Everything at Every Scale: Scale-Invariant Diffusion with Continuous Super-Resolution

arxiv url: http://arxiv.org/abs/2605.26032v1
Date: Mon, 25 May 2026 17:01:21 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-26 19:50:20.529826
Title: Everything at Every Scale: Scale-Invariant Diffusion with Continuous Super-Resolution
Title（参考訳）: あらゆるスケールで - 継続的超解法によるスケール不変拡散
Authors: Zixin Jessie Chen, Zhuo Chen, Archer Wang, Jeff Gore, William T. Freeman, Congyue Deng, Marin Soljačić,
Abstract要約: ノイズから画像を作成することは画像生成であり、粗い入力から細部を再構築することは超高解像度である。我々は、単一の非条件フレームワーク内で生成と連続超解像を統一するモデルである$textbfSKILD$を紹介した。実証的には、SKILDはFID$2.65ドル、Inception Score$9.63ドルに達する。
参考スコア（独自算出の注目度）: 27.232771005158146
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Creating images from noise is image generation; reconstructing fine details from coarse inputs is super-resolution. Despite their practical differences, both can be understood as reversing information loss across scales. We introduce $\textbf{SKILD}$, a $\textbf{S}$cale-invariant $\textbf{K}$-Space $\textbf{I}$mage $\textbf{L}$earning $\textbf{D}$iffusion model that unifies generation and continuous super-resolution within a single unconditional framework. Both natural images and critical physical systems exhibit scale invariance, and we leverage it to design a forward process that attenuates image content from fine to coarse scales while injecting spectrum-matched Gaussian noise, making scale an explicit coordinate of the diffusion dynamics. The same trained reverse process performs generation and continuous super-resolution by varying only the starting timestep: $\textit{no task-specific architecture, no conditioning branch, no classifier-free guidance, no retraining per scale factor}$. Empirically, SKILD reaches FID $2.65$ and Inception Score $9.63$ on unconditional CIFAR-10, performs $2\times$--$8\times$ super-resolution on ImageNet from a single unconditional checkpoint while outperforming conditional models across perceptual metrics, and reconstructs critical Ising models whose connected four-point correlations closely track the ground truth.
Abstract（参考訳）: ノイズから画像を作成することは画像生成であり、粗い入力から細部を再構築することは超高解像度である。両者の現実的な違いにもかかわらず、両者はスケールをまたいで情報損失を反転させるものとして理解することができる。 $\textbf{SKILD}$, a $\textbf{S}$cale-invariant $\textbf{K}$-Space $\textbf{I}$mage $\textbf{L}$earning $\textbf{D}$iffusion model。自然画像と臨界物理系の両方がスケール不変性を示しており、スペクトルマッチングされたガウス雑音を注入しながら、微細なスケールから粗いスケールへの画像内容の減衰を抑える前処理を設計し、拡散力学の明示的な座標とする。同じトレーニングされたリバースプロセスは、開始時間だけを変えることで、生成と連続した超解像を実行する: $\textit{no task-specific architecture, no conditioning branch, no classifier-free guidance, no retraining per scale factor}$。実証的に、SKILD は FID $2.65$ に達し、Inception Score 9.63$ は無条件の CIFAR-10 上で$2\times$--$8\times$ の超解像を行う。

論文の概要: Everything at Every Scale: Scale-Invariant Diffusion with Continuous Super-Resolution

関連論文リスト