Fugu-MT 論文翻訳(概要): Language-Guided Structure-Aware Network for Camouflaged Object Detection

論文の概要: Language-Guided Structure-Aware Network for Camouflaged Object Detection

arxiv url: http://arxiv.org/abs/2603.24355v1
Date: Wed, 25 Mar 2026 14:37:44 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-26 21:06:11.333229
Title: Language-Guided Structure-Aware Network for Camouflaged Object Detection
Title（参考訳）: カモフラージュ対象検出のための言語ガイド型構造認識ネットワーク
Authors: Min Zhang,
Abstract要約: Camouflaged Object Detection (COD) は、色、テクスチャ、構造において背景と高度に統合されたオブジェクトを分割することを目的としている。既存の手法では、上記の問題を緩和するために、マルチスケールの融合とアテンション機構を導入している。本稿では,言語ガイド型構造認識ネットワーク(LGSAN)を提案する。
参考スコア（独自算出の注目度）: 15.32173600433245
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Camouflaged Object Detection (COD) aims to segment objects that are highly integrated with the background in terms of color, texture, and structure, making it a highly challenging task in computer vision. Although existing methods introduce multi-scale fusion and attention mechanisms to alleviate the above issues, they generally lack the guidance of textual semantic priors, which limits the model's ability to focus on camouflaged regions in complex scenes. To address this issue, this paper proposes a Language-Guided Structure-Aware Network (LGSAN). Specifically, based on the visual backbone PVT-v2, we introduce CLIP to generate masks from text prompts and RGB images, thereby guiding the multi-scale features extracted by PVT-v2 to focus on potential target regions. On this foundation, we further design a Fourier Edge Enhancement Module (FEEM), which integrates multi-scale features with high-frequency information in the frequency domain to extract edge enhancement features. Furthermore, we propose a Structure-Aware Attention Module (SAAM) to effectively enhance the model's perception of object structures and boundaries. Finally, we introduce a Coarse-Guided Local Refinement Module (CGLRM) to enhance fine-grained reconstruction and boundary integrity of camouflaged object regions. Extensive experiments demonstrate that our method consistently achieves highly competitive performance across multiple COD datasets, validating its effectiveness and robustness.
Abstract（参考訳）: Camouflaged Object Detection (COD) は、色、テクスチャ、構造において背景と高度に統合されたオブジェクトを分割することを目的としており、コンピュータビジョンにおいて非常に難しいタスクである。既存の手法では、上記の問題を緩和するために、マルチスケールの融合とアテンションメカニズムを導入しているが、それらは一般的に、複雑なシーンにおけるカモフラージュされた領域にフォーカスする能力を制限する、テキストセマンティック先行のガイダンスを欠いている。本稿では,言語ガイド型構造認識ネットワーク(LGSAN)を提案する。具体的には、視覚的バックボーンPVT-v2に基づいて、テキストプロンプトとRGB画像からマスクを生成するCLIPを導入し、PVT-v2によって抽出されたマルチスケール特徴を潜在的ターゲット領域に集中させる。本研究の基盤となるFourier Edge Enhancement Module (FEEM) は,周波数領域の高周波情報とマルチスケール特徴を統合し,エッジ強調特徴を抽出する。さらに,対象構造と境界に対する知覚を効果的に増強する構造認識注意モジュール (SAAM) を提案する。最後に,カースガイド型局所微細化モジュール(CGLRM)を導入し,カモフラージュされた対象領域の微細化と境界の整合性を高める。大規模な実験により,本手法は複数のCODデータセットにまたがる高い競争性能を連続的に達成し,その有効性と堅牢性を検証した。

論文の概要: Language-Guided Structure-Aware Network for Camouflaged Object Detection

関連論文リスト