Fugu-MT 論文翻訳(概要): RAGSR: Regional Attention Guided Diffusion for Image Super-Resolution

論文の概要: RAGSR: Regional Attention Guided Diffusion for Image Super-Resolution

arxiv url: http://arxiv.org/abs/2508.16158v1
Date: Fri, 22 Aug 2025 07:28:34 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-25 16:42:36.292016
Title: RAGSR: Regional Attention Guided Diffusion for Image Super-Resolution
Title（参考訳）: RAGSR:画像超解像のための局所的注意誘導拡散
Authors: Haodong He, Yancheng Bai, Rui Lan, Xu Duan, Lei Sun, Xiangxiang Chu, Gui-Song Xia,
Abstract要約: 超高解像度画像における鮮明で正確な地域詳細を生成するための新しい手法を提案する。本手法は、局所化されたきめ細かい情報を明示的に抽出し、新しい地域注意機構を介して符号化する。ベンチマークによる実験結果から,本手法は知覚的視覚的詳細を生成する上で,優れた性能を示すことが示された。
参考スコア（独自算出の注目度）: 38.794214985205045
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The rich textual information of large vision-language models (VLMs) combined with the powerful generative prior of pre-trained text-to-image (T2I) diffusion models has achieved impressive performance in single-image super-resolution (SISR). However, existing methods still face significant challenges in generating clear and accurate regional details, particularly in scenarios involving multiple objects. This challenge primarily stems from a lack of fine-grained regional descriptions and the models' insufficient ability to capture complex prompts. To address these limitations, we propose a Regional Attention Guided Super-Resolution (RAGSR) method that explicitly extracts localized fine-grained information and effectively encodes it through a novel regional attention mechanism, enabling both enhanced detail and overall visually coherent SR results. Specifically, RAGSR localizes object regions in an image and assigns fine-grained caption to each region, which are formatted as region-text pairs as textual priors for T2I models. A regional guided attention is then leveraged to ensure that each region-text pair is properly considered in the attention process while preventing unwanted interactions between unrelated region-text pairs. By leveraging this attention mechanism, our approach offers finer control over the integration of text and image information, thereby effectively overcoming limitations faced by traditional SISR techniques. Experimental results on benchmark datasets demonstrate that our approach exhibits superior performance in generating perceptually authentic visual details while maintaining contextual consistency compared to existing approaches.
Abstract（参考訳）: 大規模視覚言語モデル(VLM)のリッチテキスト情報と、事前訓練されたテキスト・トゥ・イメージ(T2I)拡散モデルの強力な生成とが組み合わさって、単一画像超解像(SISR)において印象的な性能を達成した。しかし、既存の手法は、特に複数のオブジェクトを含むシナリオにおいて、明確で正確な地域の詳細を生成する上で大きな課題に直面している。この課題は主に、きめ細かい地域記述の欠如と、複雑なプロンプトを捉える能力の不足に起因している。これらの制約に対処するために,局所的なきめ細かい情報を明示的に抽出し,新たな地域注意機構を通じて効果的に符号化するRAGSR法を提案する。具体的には、RAGSRは画像中の対象領域をローカライズし、各領域に微粒なキャプションを割り当てる。次に、各領域テキストペアが、無関係な領域テキストペア間の不要な相互作用を防止しつつ、注意プロセスにおいて適切に考慮されることを保証するために、地域ガイドされた注意を利用する。この注意機構を活用することで、従来のSISR技術が直面する制約を効果的に克服し、テキストと画像情報の統合をより細かく制御できる。ベンチマークデータを用いた実験結果から,既存の手法と比較してコンテキスト整合性を維持しつつ,知覚的視覚的詳細を生成する上で,本手法が優れた性能を示すことが示された。

論文の概要: RAGSR: Regional Attention Guided Diffusion for Image Super-Resolution

関連論文リスト