DQnet: Cross-Model Detail Querying for Camouflaged Object Detection
- URL: http://arxiv.org/abs/2212.08296v1
- Date: Fri, 16 Dec 2022 06:23:58 GMT
- Title: DQnet: Cross-Model Detail Querying for Camouflaged Object Detection
- Authors: Wei Sun, Chengao Liu, Linyan Zhang, Yu Li, Pengxu Wei, Chang Liu,
Jialing Zou, Jianbin Jiao, Qixiang Ye
- Abstract summary: A convolutional neural network (CNN) for camouflaged object detection tends to activate local discriminative regions while ignoring complete object extent.
In this paper, we argue that partial activation is caused by the intrinsic characteristics of CNN.
In order to obtain feature maps that could activate full object extent, a novel framework termed Cross-Model Detail Querying network (DQnet) is proposed.
- Score: 54.82390534024954
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Camouflaged objects are seamlessly blended in with their surroundings, which
brings a challenging detection task in computer vision. Optimizing a
convolutional neural network (CNN) for camouflaged object detection (COD) tends
to activate local discriminative regions while ignoring complete object extent,
causing the partial activation issue which inevitably leads to missing or
redundant regions of objects. In this paper, we argue that partial activation
is caused by the intrinsic characteristics of CNN, where the convolution
operations produce local receptive fields and experience difficulty to capture
long-range feature dependency among image regions. In order to obtain feature
maps that could activate full object extent, keeping the segmental results from
being overwhelmed by noisy features, a novel framework termed Cross-Model
Detail Querying network (DQnet) is proposed. It reasons the relations between
long-range-aware representations and multi-scale local details to make the
enhanced representation fully highlight the object regions and eliminate noise
on non-object regions. Specifically, a vanilla ViT pretrained with
self-supervised learning (SSL) is employed to model long-range dependencies
among image regions. A ResNet is employed to enable learning fine-grained
spatial local details in multiple scales. Then, to effectively retrieve
object-related details, a Relation-Based Querying (RBQ) module is proposed to
explore window-based interactions between the global representations and the
multi-scale local details. Extensive experiments are conducted on the widely
used COD datasets and show that our DQnet outperforms the current
state-of-the-arts.
Related papers
- United Domain Cognition Network for Salient Object Detection in Optical Remote Sensing Images [21.76732661032257]
We propose a novel United Domain Cognition Network (UDCNet) to jointly explore the global-local information in the frequency and spatial domains.
Experimental results demonstrate the superiority of the proposed UDCNet over 24 state-of-the-art models.
arXiv Detail & Related papers (2024-11-11T04:12:27Z) - Hierarchical Graph Interaction Transformer with Dynamic Token Clustering for Camouflaged Object Detection [57.883265488038134]
We propose a hierarchical graph interaction network termed HGINet for camouflaged object detection.
The network is capable of discovering imperceptible objects via effective graph interaction among the hierarchical tokenized features.
Our experiments demonstrate the superior performance of HGINet compared to existing state-of-the-art methods.
arXiv Detail & Related papers (2024-08-27T12:53:25Z) - Frequency Perception Network for Camouflaged Object Detection [51.26386921922031]
We propose a novel learnable and separable frequency perception mechanism driven by the semantic hierarchy in the frequency domain.
Our entire network adopts a two-stage model, including a frequency-guided coarse localization stage and a detail-preserving fine localization stage.
Compared with the currently existing models, our proposed method achieves competitive performance in three popular benchmark datasets.
arXiv Detail & Related papers (2023-08-17T11:30:46Z) - Feature Aggregation and Propagation Network for Camouflaged Object
Detection [42.33180748293329]
Camouflaged object detection (COD) aims to detect/segment camouflaged objects embedded in the environment.
Several COD methods have been developed, but they still suffer from unsatisfactory performance due to intrinsic similarities between foreground objects and background surroundings.
We propose a novel Feature Aggregation and propagation Network (FAP-Net) for camouflaged object detection.
arXiv Detail & Related papers (2022-12-02T05:54:28Z) - AGO-Net: Association-Guided 3D Point Cloud Object Detection Network [86.10213302724085]
We propose a novel 3D detection framework that associates intact features for objects via domain adaptation.
We achieve new state-of-the-art performance on the KITTI 3D detection benchmark in both accuracy and speed.
arXiv Detail & Related papers (2022-08-24T16:54:38Z) - High-resolution Iterative Feedback Network for Camouflaged Object
Detection [128.893782016078]
Spotting camouflaged objects that are visually assimilated into the background is tricky for object detection algorithms.
We aim to extract the high-resolution texture details to avoid the detail degradation that causes blurred vision in edges and boundaries.
We introduce a novel HitNet to refine the low-resolution representations by high-resolution features in an iterative feedback manner.
arXiv Detail & Related papers (2022-03-22T11:20:21Z) - Addressing Multiple Salient Object Detection via Dual-Space Long-Range
Dependencies [3.8824028205733017]
Salient object detection plays an important role in many downstream tasks.
We propose a network architecture incorporating non-local feature information in both the spatial and channel spaces.
We show that our approach accurately locates multiple salient regions even in complex scenarios.
arXiv Detail & Related papers (2021-11-04T23:16:53Z) - Local Context Attention for Salient Object Segmentation [5.542044768017415]
We propose a novel Local Context Attention Network (LCANet) to generate locally reinforcement feature maps in a uniform representational architecture.
The proposed network introduces an Attentional Correlation Filter (ACF) module to generate explicit local attention by calculating the correlation feature map between coarse prediction and global context.
Comprehensive experiments are conducted on several salient object segmentation datasets, demonstrating the superior performance of the proposed LCANet against the state-of-the-art methods.
arXiv Detail & Related papers (2020-09-24T09:20:06Z) - Co-Saliency Spatio-Temporal Interaction Network for Person
Re-Identification in Videos [85.6430597108455]
We propose a novel Co-Saliency Spatio-Temporal Interaction Network (CSTNet) for person re-identification in videos.
It captures the common salient foreground regions among video frames and explores the spatial-temporal long-range context interdependency from such regions.
Multiple spatialtemporal interaction modules within CSTNet are proposed, which exploit the spatial and temporal long-range context interdependencies on such features and spatial-temporal information correlation.
arXiv Detail & Related papers (2020-04-10T10:23:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.