Fugu-MT 論文翻訳(概要): Referring Camouflaged Object Detection With Multi-Context Overlapped Windows Cross-Attention

論文の概要: Referring Camouflaged Object Detection With Multi-Context Overlapped Windows Cross-Attention

arxiv url: http://arxiv.org/abs/2511.13249v1
Date: Mon, 17 Nov 2025 11:08:50 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-18 14:36:25.15376
Title: Referring Camouflaged Object Detection With Multi-Context Overlapped Windows Cross-Attention
Title（参考訳）: マルチコンテキストオーバーラップWindowsクロスアテンションによるカモフラージュオブジェクト検出の参照
Authors: Yu Wen, Shuyong Gao, Shuping Zhang, Miao Huang, Lili Tao, Han Yang, Haozhe Xing, Lihe Zhang, Boxue Hou,
Abstract要約: Referring camouflaged object detection (Ref-COD)は、画像やテキスト記述などの参照情報を組み込んで隠れた物体を識別することを目的としている。本研究では,多言語融合による高精細画像特徴と擬似オブジェクト特徴の融合による性能向上手法を検討する。
参考スコア（独自算出の注目度）: 22.790236918151574
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Referring camouflaged object detection (Ref-COD) aims to identify hidden objects by incorporating reference information such as images and text descriptions. Previous research has transformed reference images with salient objects into one-dimensional prompts, yielding significant results. We explore ways to enhance performance through multi-context fusion of rich salient image features and camouflaged object features. Therefore, we propose RFMNet, which utilizes features from multiple encoding stages of the reference salient images and performs interactive fusion with the camouflage features at the corresponding encoding stages. Given that the features in salient object images contain abundant object-related detail information, performing feature fusion within local areas is more beneficial for detecting camouflaged objects. Therefore, we propose an Overlapped Windows Cross-attention mechanism to enable the model to focus more attention on the local information matching based on reference features. Besides, we propose the Referring Feature Aggregation (RFA) module to decode and segment the camouflaged objects progressively. Extensive experiments on the Ref-COD benchmark demonstrate that our method achieves state-of-the-art performance.
Abstract（参考訳）: Referring camouflaged object detection (Ref-COD)は、画像やテキスト記述などの参照情報を組み込んで隠れた物体を識別することを目的としている。従来の研究は、静かな物体の参照画像を1次元のプロンプトに変換し、重要な結果をもたらした。本研究では,多言語融合による高精細画像特徴と擬似オブジェクト特徴の融合による性能向上手法を検討する。そこで,本研究では,参照サルエント画像の複数のエンコーディング段階の特徴を利用したRAMNetを提案し,対応するエンコーディング段階におけるカモフラージュ特徴と対話的な融合を行う。対象物画像の特徴が豊富な対象物の詳細情報を含んでいることを考えると、局所領域内での特徴融合を行うことは、偽造対象を検出するのにより有益である。そこで本研究では,参照機能に基づく局所的な情報マッチングをより重視するための,オーバーラップされたWindowsクロスアテンション機構を提案する。さらに,提案するRFA(Referring Feature Aggregation)モジュールにより,キャモフラージュしたオブジェクトを段階的にデコードし,分割する。 Ref-CODベンチマークの大規模な実験により,本手法が最先端の性能を実現することを示す。

論文の概要: Referring Camouflaged Object Detection With Multi-Context Overlapped Windows Cross-Attention

関連論文リスト