Fugu-MT 論文翻訳(概要): Disentangled Textual Priors for Diffusion-based Image Super-Resolution

論文の概要: Disentangled Textual Priors for Diffusion-based Image Super-Resolution

arxiv url: http://arxiv.org/abs/2603.07430v1
Date: Sun, 08 Mar 2026 03:02:55 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-10 15:13:14.504762
Title: Disentangled Textual Priors for Diffusion-based Image Super-Resolution
Title（参考訳）: 拡散型画像超解像の遠絡テキスト優先法
Authors: Lei Jiang, Xin Liu, Xinze Tong, Zhiliang Li, Jie Liu, Jie Tang, Gangshan Wu,
Abstract要約: Image Super-Resolutionは、劣化した低解像度入力から高解像度画像を再構成することを目的としている。既存のアプローチは、グローバルなレイアウトとローカルな詳細を混ぜ合わせた、絡み合った、あるいは粗い粒度の前のものに依存することが多い。 DTPSRは,2つの相補的な次元に絡み合ったテキストの先行処理を導入する新しい拡散型SRフレームワークである。
参考スコア（独自算出の注目度）: 41.71306518338786
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Image Super-Resolution (SR) aims to reconstruct high-resolution images from degraded low-resolution inputs. While diffusion-based SR methods offer powerful generative capabilities, their performance heavily depends on how semantic priors are structured and integrated into the generation process. Existing approaches often rely on entangled or coarse-grained priors that mix global layout with local details, or conflate structural and textural cues, thereby limiting semantic controllability and interpretability. In this work, we propose DTPSR, a novel diffusion-based SR framework that introduces disentangled textual priors along two complementary dimensions: spatial hierarchy (global vs. local) and frequency semantics (low- vs. high-frequency). By explicitly separating these priors, DTPSR enables the model to simultaneously capture scene-level structure and object-specific details with frequency-aware semantic guidance. The corresponding embeddings are injected via specialized cross-attention modules, forming a progressive generation pipeline that reflects the semantic granularity of visual content, from global layout to fine-grained textures. To support this paradigm, we construct DisText-SR, a large-scale dataset containing approximately 95,000 image-text pairs with carefully disentangled global, low-frequency, and high-frequency descriptions. To further enhance controllability and consistency, we adopt a multi-branch classifier-free guidance strategy with frequency-aware negative prompts to suppress hallucinations and semantic drift. Extensive experiments on synthetic and real-world benchmarks show that DTPSR achieves high perceptual quality, competitive fidelity, and strong generalization across diverse degradation scenarios.
Abstract（参考訳）: Image Super-Resolution (SR)は、劣化した低解像度の入力から高解像度の画像を再構成することを目的としている。拡散型SR法は強力な生成能力を提供するが、その性能はセマンティックな事前構造がどのように構成され、生成プロセスに統合されるかに大きく依存する。既存のアプローチでは、グローバルなレイアウトを局所的な詳細と混在させたり、構造的およびテクスチャ的キューを分割したりすることで、意味的な制御可能性や解釈可能性を制限することができる。本研究では,空間的階層(グローバル対ローカル)と周波数意味論(低対高周波)という2つの相補的な次元に沿って,非絡み合いのテキストを先行する拡散型SRフレームワークであるDTPSRを提案する。これらの先行を明示的に分離することにより、DTPSRはシーンレベルの構造とオブジェクト固有の詳細を周波数対応のセマンティックガイダンスで同時にキャプチャすることを可能にする。対応する埋め込みは特別なクロスアテンションモジュールを介して注入され、グローバルなレイアウトからきめ細かいテクスチャに至るまで、視覚コンテンツのセマンティックな粒度を反映したプログレッシブ生成パイプラインを形成する。このパラダイムをサポートするために,約95,000の画像テキストペアを含む大規模データセットであるDisText-SRを構築した。制御性と整合性をさらに向上するため,周波数認識型負のプロンプトを持つマルチブランチ分類器フリー誘導戦略を採用し,幻覚やセマンティックドリフトを抑制する。総合的および実世界のベンチマーク実験により、DTPSRは様々な劣化シナリオにおける高い知覚的品質、競争力、強力な一般化を実現することが示された。

論文の概要: Disentangled Textual Priors for Diffusion-based Image Super-Resolution

関連論文リスト