Zoom In and Out: A Mixed-scale Triplet Network for Camouflaged Object
Detection
- URL: http://arxiv.org/abs/2203.02688v1
- Date: Sat, 5 Mar 2022 09:13:52 GMT
- Title: Zoom In and Out: A Mixed-scale Triplet Network for Camouflaged Object
Detection
- Authors: Pang Youwei, Zhao Xiaoqi, Xiang Tian-Zhu, Zhang Lihe, Lu Huchuan
- Abstract summary: We propose a mixed-scale triplet network, bf ZoomNet, which mimics the behavior of humans when observing vague images.
Specifically, our ZoomNet employs the zoom strategy to learn the discriminative mixed-scale semantics by the designed scale integration unit and hierarchical mixed-scale unit.
Our proposed highly task-friendly model consistently surpasses the existing 23 state-of-the-art methods on four public datasets.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The recently proposed camouflaged object detection (COD) attempts to segment
objects that are visually blended into their surroundings, which is extremely
complex and difficult in real-world scenarios. Apart from high intrinsic
similarity between the camouflaged objects and their background, the objects
are usually diverse in scale, fuzzy in appearance, and even severely occluded.
To deal with these problems, we propose a mixed-scale triplet network,
\textbf{ZoomNet}, which mimics the behavior of humans when observing vague
images, i.e., zooming in and out. Specifically, our ZoomNet employs the zoom
strategy to learn the discriminative mixed-scale semantics by the designed
scale integration unit and hierarchical mixed-scale unit, which fully explores
imperceptible clues between the candidate objects and background surroundings.
Moreover, considering the uncertainty and ambiguity derived from
indistinguishable textures, we construct a simple yet effective regularization
constraint, uncertainty-aware loss, to promote the model to accurately produce
predictions with higher confidence in candidate regions. Without bells and
whistles, our proposed highly task-friendly model consistently surpasses the
existing 23 state-of-the-art methods on four public datasets. Besides, the
superior performance over the recent cutting-edge models on the SOD task also
verifies the effectiveness and generality of our model. The code will be
available at \url{https://github.com/lartpang/ZoomNet}.
Related papers
- Glass Segmentation with Multi Scales and Primary Prediction Guiding [2.66512000865131]
Glass-like objects can be seen everywhere in our daily life which are hard for existing methods to segment them.
We propose MGNet, which consists of a FineRescaling and Merging module (FRM) to improve the ability to extract semantics.
We supervise the model with a novel loss function with the uncertainty-aware loss to produce high-confidence segmentation maps.
arXiv Detail & Related papers (2024-02-13T16:14:32Z) - ZoomNeXt: A Unified Collaborative Pyramid Network for Camouflaged Object Detection [70.11264880907652]
Recent object (COD) attempts to segment objects visually blended into their surroundings, which is extremely complex and difficult in real-world scenarios.
We propose an effective unified collaborative pyramid network that mimics human behavior when observing vague images and camouflaged zooming in and out.
Our framework consistently outperforms existing state-of-the-art methods in image and video COD benchmarks.
arXiv Detail & Related papers (2023-10-31T06:11:23Z) - Frequency Perception Network for Camouflaged Object Detection [51.26386921922031]
We propose a novel learnable and separable frequency perception mechanism driven by the semantic hierarchy in the frequency domain.
Our entire network adopts a two-stage model, including a frequency-guided coarse localization stage and a detail-preserving fine localization stage.
Compared with the currently existing models, our proposed method achieves competitive performance in three popular benchmark datasets.
arXiv Detail & Related papers (2023-08-17T11:30:46Z) - A bioinspired three-stage model for camouflaged object detection [8.11866601771984]
We propose a three-stage model that enables coarse-to-fine segmentation in a single iteration.
Our model employs three decoders to sequentially process subsampled features, cropped features, and high-resolution original features.
Our network surpasses state-of-the-art CNN-based counterparts without unnecessary complexities.
arXiv Detail & Related papers (2023-05-22T02:01:48Z) - CamDiff: Camouflage Image Augmentation via Diffusion Model [83.35960536063857]
CamDiff is a novel approach to synthesize salient objects in camouflaged scenes.
We leverage the latent diffusion model to synthesize salient objects in camouflaged scenes.
Our approach enables flexible editing and efficient large-scale dataset generation at a low cost.
arXiv Detail & Related papers (2023-04-11T19:37:47Z) - High-resolution Iterative Feedback Network for Camouflaged Object
Detection [128.893782016078]
Spotting camouflaged objects that are visually assimilated into the background is tricky for object detection algorithms.
We aim to extract the high-resolution texture details to avoid the detail degradation that causes blurred vision in edges and boundaries.
We introduce a novel HitNet to refine the low-resolution representations by high-resolution features in an iterative feedback manner.
arXiv Detail & Related papers (2022-03-22T11:20:21Z) - Fast Camouflaged Object Detection via Edge-based Reversible
Re-calibration Network [17.538512222905087]
This paper proposes a novel edge-based reversible re-calibration network called ERRNet.
Our model is characterized by two innovative designs, namely Selective Edge Aggregation (SEA) and Reversible Re-calibration Unit (RRU)
Experimental results show that ERRNet outperforms existing cutting-edge baselines on three COD datasets and five medical image segmentation datasets.
arXiv Detail & Related papers (2021-11-05T02:03:54Z) - Multi-scale Interactive Network for Salient Object Detection [91.43066633305662]
We propose the aggregate interaction modules to integrate the features from adjacent levels.
To obtain more efficient multi-scale features, the self-interaction modules are embedded in each decoder unit.
Experimental results on five benchmark datasets demonstrate that the proposed method without any post-processing performs favorably against 23 state-of-the-art approaches.
arXiv Detail & Related papers (2020-07-17T15:41:37Z) - Image Fine-grained Inpainting [89.17316318927621]
We present a one-stage model that utilizes dense combinations of dilated convolutions to obtain larger and more effective receptive fields.
To better train this efficient generator, except for frequently-used VGG feature matching loss, we design a novel self-guided regression loss.
We also employ a discriminator with local and global branches to ensure local-global contents consistency.
arXiv Detail & Related papers (2020-02-07T03:45:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.