Fugu-MT 論文翻訳(概要): SGDFuse: SAM-Guided Diffusion for High-Fidelity Infrared and Visible Image Fusion

論文の概要: SGDFuse: SAM-Guided Diffusion for High-Fidelity Infrared and Visible Image Fusion

arxiv url: http://arxiv.org/abs/2508.05264v1
Date: Thu, 07 Aug 2025 10:58:52 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-08 18:59:39.827833
Title: SGDFuse: SAM-Guided Diffusion for High-Fidelity Infrared and Visible Image Fusion
Title（参考訳）: SGDFuse: SAM-Guided Diffusion for High-Fidelity Infrared and Visible Image Fusion
Authors: Xiaoyang Zhang, Zhen Hua, Yakun Ju, Wei Zhou, Jun Liu, Alex C. Kot,
Abstract要約: 本稿では,Segment Anything Model(SAM)によって導かれる条件拡散モデルを提案する。このフレームワークは2段階のプロセスで動作し、まずマルチモーダルな特徴の予備的な融合を行い、その後、拡散モデルの粗大な分極生成を駆動する条件としてセマンティックマスクを利用する。 SGDFuseは主観的評価と客観的評価の両方において最先端の性能を発揮することを示す。
参考スコア（独自算出の注目度）: 38.09521879556221
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Infrared and visible image fusion (IVIF) aims to combine the thermal radiation information from infrared images with the rich texture details from visible images to enhance perceptual capabilities for downstream visual tasks. However, existing methods often fail to preserve key targets due to a lack of deep semantic understanding of the scene, while the fusion process itself can also introduce artifacts and detail loss, severely compromising both image quality and task performance. To address these issues, this paper proposes SGDFuse, a conditional diffusion model guided by the Segment Anything Model (SAM), to achieve high-fidelity and semantically-aware image fusion. The core of our method is to utilize high-quality semantic masks generated by SAM as explicit priors to guide the optimization of the fusion process via a conditional diffusion model. Specifically, the framework operates in a two-stage process: it first performs a preliminary fusion of multi-modal features, and then utilizes the semantic masks from SAM jointly with the preliminary fused image as a condition to drive the diffusion model's coarse-to-fine denoising generation. This ensures the fusion process not only has explicit semantic directionality but also guarantees the high fidelity of the final result. Extensive experiments demonstrate that SGDFuse achieves state-of-the-art performance in both subjective and objective evaluations, as well as in its adaptability to downstream tasks, providing a powerful solution to the core challenges in image fusion. The code of SGDFuse is available at https://github.com/boshizhang123/SGDFuse.
Abstract（参考訳）: Infrared and visible image fusion (IVIF) は、赤外線画像からの熱放射情報と可視画像からの豊かなテクスチャの詳細を組み合わせ、下流視覚タスクの知覚能力を高めることを目的としている。しかし、既存の手法ではシーンの深いセマンティックな理解が欠如しているためキーターゲットの保存に失敗することが多く、融合プロセス自体もアーティファクトや詳細損失を導入し、画像の品質とタスクのパフォーマンスを著しく向上させる。これらの問題に対処するために,Segment Anything Model (SAM) によって導かれる条件拡散モデル SGDFuse を提案する。提案手法の核となるのは,SAM が生成する高品質なセマンティックマスクを明示的な先行として利用し,条件付き拡散モデルを用いて融合プロセスの最適化を導くことである。具体的には、まず、マルチモーダル特徴の予備融合を行い、次に、SAMからのセマンティックマスクと予備融合画像とを条件として、拡散モデルの粗大な偏極生成を駆動する。これにより、融合プロセスは明示的な意味的な方向性を持つだけでなく、最終的な結果の忠実度も保証される。大規模な実験により、SGDFuseは主観的および客観的な評価と下流タスクへの適応性を両立させ、画像融合におけるコア課題に対する強力な解決策を提供する。 SGDFuseのコードはhttps://github.com/boshizhang123/SGDFuseで公開されている。

論文の概要: SGDFuse: SAM-Guided Diffusion for High-Fidelity Infrared and Visible Image Fusion

関連論文リスト