Fugu-MT 論文翻訳(概要): Efficient One-Step Diffusion Restoration Model with Compact Token Compression and Linear Attention

論文の概要: Efficient One-Step Diffusion Restoration Model with Compact Token Compression and Linear Attention

arxiv url: http://arxiv.org/abs/2605.23451v1
Date: Fri, 22 May 2026 10:07:02 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-25 17:29:20.303143
Title: Efficient One-Step Diffusion Restoration Model with Compact Token Compression and Linear Attention
Title（参考訳）: コンパクトなトーケン圧縮とリニアアテンションを用いた1ステップ拡散回復モデル
Authors: Bingtian Qiao, Yue Shi, Yingjie Zhou, Yong Guo, Guangtao Zhai, Jiezhang Cao,
Abstract要約: 既存のReal-ISR法は、高密度な潜在表現と2次コストのグローバルモデリングパラダイムを継承する。重要なボトルネックは、高分解能回復中の過剰なトークン冗長性とコストのかかるトークン相互作用にある、と我々は主張する。我々は,LRA微細調整によるリニアアテンションDiTを導入し,リニア複雑トークン混合による高分解能復元を実現した。
参考スコア（独自算出の注目度）: 66.63806505114263
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Real-world image super-resolution aims to recover high-quality images from complex and unknown real-world degradations. However, existing generative Real-ISR methods largely inherit the dense latent representations and quadratic-cost global modeling paradigm developed for high-resolution image synthesis, causing computation, memory usage, and inference latency to scale unfavorably with resolution and thus limiting practical deployment. We argue that the key bottleneck lies not in insufficient restoration priors, but in excessive token redundancy and costly token interactions during high-resolution restoration. Motivated by this observation, we revisit Real-ISR from the perspectives of compact latent representation and linear-complexity modeling, and propose SANA-SR, an efficient one-step restoration framework. Specifically, SANA-SR employs a deep compression autoencoder with a 32x compression ratio to drastically reduce latent tokens while preserving restoration-relevant structures and textures. On top of this compact latent space, we introduce a linear-attention DiT with LoRA fine-tuning, enabling efficient high-resolution restoration with linear-complexity token mixing. Extensive experiments on all benchmark datasets demonstrate that SANA-SR achieves highly competitive and often superior quantitative performance against existing methods, while restoring clearer and more realistic textures. Moreover, after pruning, the deployed model runs in 0.019s with 407.95G MACs and 344M parameters, highlighting its strong potential for practical mobile deployment.
Abstract（参考訳）: 現実世界の画像超解像は、複雑で未知の現実世界の劣化から高品質な画像を復元することを目的としている。しかし、既存の生成型Real-ISR法は、高解像度画像合成のために開発された高密度の潜伏表現と二次コストのグローバルなモデリングパラダイムを主に継承し、計算、メモリ使用量、推論遅延を解決不可能なスケールにし、実用的な展開を制限している。重要なボトルネックは、修復前の不十分さではなく、過剰なトークン冗長性と高分解能回復におけるコストのかかるトークン相互作用にある、と我々は主張する。本稿では,コンパクトな潜在表現と線形複雑度モデリングの観点からReal-ISRを再考し,効率的なワンステップ復元フレームワークであるSANA-SRを提案する。具体的には、SANA-SRは32倍圧縮率の深部圧縮オートエンコーダを用いて、復元関連構造やテクスチャを保存しながら、潜在トークンを大幅に削減する。このコンパクトな潜在空間の上に,LoRA微細調整付きリニアアテンションDiTを導入し,線形複雑トークン混合による高分解能復元を実現する。すべてのベンチマークデータセットに対する大規模な実験は、SANA-SRが既存の手法に対して非常に競争力があり、しばしば優れた量的パフォーマンスを達成し、より明確でより現実的なテクスチャを復元していることを示している。さらに、プルーニング後、デプロイモデルは0.019sで動作し、407.95GのMACと344Mのパラメータを持つ。

論文の概要: Efficient One-Step Diffusion Restoration Model with Compact Token Compression and Linear Attention

関連論文リスト