Frequency-Spatial Entanglement Learning for Camouflaged Object Detection
- URL: http://arxiv.org/abs/2409.01686v1
- Date: Tue, 3 Sep 2024 07:58:47 GMT
- Title: Frequency-Spatial Entanglement Learning for Camouflaged Object Detection
- Authors: Yanguang Sun, Chunyan Xu, Jian Yang, Hanyu Xuan, Lei Luo,
- Abstract summary: Existing methods attempt to reduce the impact of pixel similarity by maximizing the distinguishing ability of spatial features with complicated design.
We propose a new approach to address this issue by jointly exploring the representation in the frequency and spatial domains, introducing the Frequency-Spatial Entanglement Learning (FSEL) method.
Our experiments demonstrate the superiority of our FSEL over 21 state-of-the-art methods, through comprehensive quantitative and qualitative comparisons in three widely-used datasets.
- Score: 34.426297468968485
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Camouflaged object detection has attracted a lot of attention in computer vision. The main challenge lies in the high degree of similarity between camouflaged objects and their surroundings in the spatial domain, making identification difficult. Existing methods attempt to reduce the impact of pixel similarity by maximizing the distinguishing ability of spatial features with complicated design, but often ignore the sensitivity and locality of features in the spatial domain, leading to sub-optimal results. In this paper, we propose a new approach to address this issue by jointly exploring the representation in the frequency and spatial domains, introducing the Frequency-Spatial Entanglement Learning (FSEL) method. This method consists of a series of well-designed Entanglement Transformer Blocks (ETB) for representation learning, a Joint Domain Perception Module for semantic enhancement, and a Dual-domain Reverse Parser for feature integration in the frequency and spatial domains. Specifically, the ETB utilizes frequency self-attention to effectively characterize the relationship between different frequency bands, while the entanglement feed-forward network facilitates information interaction between features of different domains through entanglement learning. Our extensive experiments demonstrate the superiority of our FSEL over 21 state-of-the-art methods, through comprehensive quantitative and qualitative comparisons in three widely-used datasets. The source code is available at: https://github.com/CSYSI/FSEL.
Related papers
- Hierarchical Graph Interaction Transformer with Dynamic Token Clustering for Camouflaged Object Detection [57.883265488038134]
We propose a hierarchical graph interaction network termed HGINet for camouflaged object detection.
The network is capable of discovering imperceptible objects via effective graph interaction among the hierarchical tokenized features.
Our experiments demonstrate the superior performance of HGINet compared to existing state-of-the-art methods.
arXiv Detail & Related papers (2024-08-27T12:53:25Z) - Multiple Contexts and Frequencies Aggregation Network forDeepfake Detection [5.65128683992597]
Deepfake detection faces increasing challenges since the fast growth of generative models in developing massive and diverse Deepfake technologies.
Recent advances rely on introducing features from spatial or frequency domains rather than modeling general forgery features within backbones.
We propose an efficient network for face forgery detection named MkfaNet, which consists of two core modules.
arXiv Detail & Related papers (2024-08-03T05:34:53Z) - DiffuBox: Refining 3D Object Detection with Point Diffusion [74.01759893280774]
We introduce a novel diffusion-based box refinement approach to ensure robust 3D object detection and localization.
We evaluate this approach under various domain adaptation settings, and our results reveal significant improvements across different datasets.
arXiv Detail & Related papers (2024-05-25T03:14:55Z) - SFFNet: A Wavelet-Based Spatial and Frequency Domain Fusion Network for Remote Sensing Segmentation [9.22384870426709]
We propose the SFFNet (Spatial and Frequency Domain Fusion Network) framework.
The first stage extracts features using spatial methods to obtain features with sufficient spatial details and semantic information.
The second stage maps these features in both spatial and frequency domains.
SFFNet achieves superior performance in terms of mIoU, reaching 84.80% and 87.73% respectively.
arXiv Detail & Related papers (2024-05-03T10:47:56Z) - Frequency Perception Network for Camouflaged Object Detection [51.26386921922031]
We propose a novel learnable and separable frequency perception mechanism driven by the semantic hierarchy in the frequency domain.
Our entire network adopts a two-stage model, including a frequency-guided coarse localization stage and a detail-preserving fine localization stage.
Compared with the currently existing models, our proposed method achieves competitive performance in three popular benchmark datasets.
arXiv Detail & Related papers (2023-08-17T11:30:46Z) - Position-Aware Relation Learning for RGB-Thermal Salient Object
Detection [3.115635707192086]
We propose a position-aware relation learning network (PRLNet) for RGB-T SOD based on swin transformer.
PRLNet explores the distance and direction relationships between pixels to strengthen intra-class compactness and inter-class separation.
In addition, we constitute a pure transformer encoder-decoder network to enhance multispectral feature representation for RGB-T SOD.
arXiv Detail & Related papers (2022-09-21T07:34:30Z) - Unsupervised Domain Adaptation via Style-Aware Self-intermediate Domain [52.783709712318405]
Unsupervised domain adaptation (UDA) has attracted considerable attention, which transfers knowledge from a label-rich source domain to a related but unlabeled target domain.
We propose a novel style-aware feature fusion method (SAFF) to bridge the large domain gap and transfer knowledge while alleviating the loss of class-discnative information.
arXiv Detail & Related papers (2022-09-05T10:06:03Z) - Adaptive Frequency Learning in Two-branch Face Forgery Detection [66.91715092251258]
We propose Adaptively learn Frequency information in the two-branch Detection framework, dubbed AFD.
We liberate our network from the fixed frequency transforms, and achieve better performance with our data- and task-dependent transform layers.
arXiv Detail & Related papers (2022-03-27T14:25:52Z) - Deep Frequency Filtering for Domain Generalization [55.66498461438285]
Deep Neural Networks (DNNs) have preferences for some frequency components in the learning process.
We propose Deep Frequency Filtering (DFF) for learning domain-generalizable features.
We show that applying our proposed DFF on a plain baseline outperforms the state-of-the-art methods on different domain generalization tasks.
arXiv Detail & Related papers (2022-03-23T05:19:06Z) - Learnable Multi-level Frequency Decomposition and Hierarchical Attention
Mechanism for Generalized Face Presentation Attack Detection [7.324459578044212]
Face presentation attack detection (PAD) is attracting a lot of attention and playing a key role in securing face recognition systems.
We propose a dual-stream convolution neural networks (CNNs) framework to deal with unseen scenarios.
We successfully prove the design of our proposed PAD solution in a step-wise ablation study.
arXiv Detail & Related papers (2021-09-16T13:06:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.