Referring Camouflaged Object Detection With Multi-Context Overlapped Windows Cross-Attention
- URL: http://arxiv.org/abs/2511.13249v1
- Date: Mon, 17 Nov 2025 11:08:50 GMT
- Title: Referring Camouflaged Object Detection With Multi-Context Overlapped Windows Cross-Attention
- Authors: Yu Wen, Shuyong Gao, Shuping Zhang, Miao Huang, Lili Tao, Han Yang, Haozhe Xing, Lihe Zhang, Boxue Hou,
- Abstract summary: Referring camouflaged object detection (Ref-COD) aims to identify hidden objects by incorporating reference information such as images and text descriptions.<n>We explore ways to enhance performance through multi-context fusion of rich salient image features and camouflaged object features.
- Score: 22.790236918151574
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Referring camouflaged object detection (Ref-COD) aims to identify hidden objects by incorporating reference information such as images and text descriptions. Previous research has transformed reference images with salient objects into one-dimensional prompts, yielding significant results. We explore ways to enhance performance through multi-context fusion of rich salient image features and camouflaged object features. Therefore, we propose RFMNet, which utilizes features from multiple encoding stages of the reference salient images and performs interactive fusion with the camouflage features at the corresponding encoding stages. Given that the features in salient object images contain abundant object-related detail information, performing feature fusion within local areas is more beneficial for detecting camouflaged objects. Therefore, we propose an Overlapped Windows Cross-attention mechanism to enable the model to focus more attention on the local information matching based on reference features. Besides, we propose the Referring Feature Aggregation (RFA) module to decode and segment the camouflaged objects progressively. Extensive experiments on the Ref-COD benchmark demonstrate that our method achieves state-of-the-art performance.
Related papers
- Object-Aware Video Matting with Cross-Frame Guidance [35.785998735049006]
We present a trimap-free Object-Aware Video Matting (OAVM) framework, which can perceive different objects, enabling joint recognition of foreground objects and refinement of edge details.<n>Specifically, we propose an Object-Guided Correction and Refinement (OGCR) module, which employs cross-frame guidance to aggregate object-level instance information into pixel-level detail features.<n>We also design a Sequential Foreground Merging augmentation strategy to diversify sequential scenarios and enhance capacity of the network for object discrimination.
arXiv Detail & Related papers (2025-03-03T07:40:32Z) - Hierarchical Graph Interaction Transformer with Dynamic Token Clustering for Camouflaged Object Detection [57.883265488038134]
We propose a hierarchical graph interaction network termed HGINet for camouflaged object detection.
The network is capable of discovering imperceptible objects via effective graph interaction among the hierarchical tokenized features.
Our experiments demonstrate the superior performance of HGINet compared to existing state-of-the-art methods.
arXiv Detail & Related papers (2024-08-27T12:53:25Z) - Referring Camouflaged Object Detection [88.63744004901655]
Ref-COD aims to segment specified camouflaged objects based on a small set of referring images with salient target objects.<n>We first assemble a large-scale dataset, called R2C7K, which consists of 7K images covering 64 object categories in real-world scenarios.
arXiv Detail & Related papers (2023-06-13T04:15:37Z) - Detector Guidance for Multi-Object Text-to-Image Generation [61.70018793720616]
Detector Guidance (DG) integrates a latent object detection model to separate different objects during the generation process.
Human evaluations demonstrate that DG provides an 8-22% advantage in preventing the amalgamation of conflicting concepts.
arXiv Detail & Related papers (2023-06-04T02:33:12Z) - DQnet: Cross-Model Detail Querying for Camouflaged Object Detection [54.82390534024954]
A convolutional neural network (CNN) for camouflaged object detection tends to activate local discriminative regions while ignoring complete object extent.
In this paper, we argue that partial activation is caused by the intrinsic characteristics of CNN.
In order to obtain feature maps that could activate full object extent, a novel framework termed Cross-Model Detail Querying network (DQnet) is proposed.
arXiv Detail & Related papers (2022-12-16T06:23:58Z) - Feature Aggregation and Propagation Network for Camouflaged Object
Detection [42.33180748293329]
Camouflaged object detection (COD) aims to detect/segment camouflaged objects embedded in the environment.
Several COD methods have been developed, but they still suffer from unsatisfactory performance due to intrinsic similarities between foreground objects and background surroundings.
We propose a novel Feature Aggregation and propagation Network (FAP-Net) for camouflaged object detection.
arXiv Detail & Related papers (2022-12-02T05:54:28Z) - MFFN: Multi-view Feature Fusion Network for Camouflaged Object Detection [10.04773536815808]
We propose a behavior-inspired framework, called Multi-view Feature Fusion Network (MFFN), which mimics the human behaviors of finding indistinct objects in images.
MFFN captures critical edge and semantic information by comparing and fusing extracted multi-view features.
Our method performs favorably against existing state-of-the-art methods via training with the same data.
arXiv Detail & Related papers (2022-10-12T16:12:58Z) - ObjectFormer for Image Manipulation Detection and Localization [118.89882740099137]
We propose ObjectFormer to detect and localize image manipulations.
We extract high-frequency features of the images and combine them with RGB features as multimodal patch embeddings.
We conduct extensive experiments on various datasets and the results verify the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-03-28T12:27:34Z) - High-resolution Iterative Feedback Network for Camouflaged Object
Detection [128.893782016078]
Spotting camouflaged objects that are visually assimilated into the background is tricky for object detection algorithms.
We aim to extract the high-resolution texture details to avoid the detail degradation that causes blurred vision in edges and boundaries.
We introduce a novel HitNet to refine the low-resolution representations by high-resolution features in an iterative feedback manner.
arXiv Detail & Related papers (2022-03-22T11:20:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.