ZoomNeXt: A Unified Collaborative Pyramid Network for Camouflaged Object Detection
- URL: http://arxiv.org/abs/2310.20208v4
- Date: Sun, 14 Jul 2024 09:02:22 GMT
- Title: ZoomNeXt: A Unified Collaborative Pyramid Network for Camouflaged Object Detection
- Authors: Youwei Pang, Xiaoqi Zhao, Tian-Zhu Xiang, Lihe Zhang, Huchuan Lu,
- Abstract summary: Recent object (COD) attempts to segment objects visually blended into their surroundings, which is extremely complex and difficult in real-world scenarios.
We propose an effective unified collaborative pyramid network that mimics human behavior when observing vague images and camouflaged zooming in and out.
Our framework consistently outperforms existing state-of-the-art methods in image and video COD benchmarks.
- Score: 70.11264880907652
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent camouflaged object detection (COD) attempts to segment objects visually blended into their surroundings, which is extremely complex and difficult in real-world scenarios. Apart from the high intrinsic similarity between camouflaged objects and their background, objects are usually diverse in scale, fuzzy in appearance, and even severely occluded. To this end, we propose an effective unified collaborative pyramid network that mimics human behavior when observing vague images and videos, \ie zooming in and out. Specifically, our approach employs the zooming strategy to learn discriminative mixed-scale semantics by the multi-head scale integration and rich granularity perception units, which are designed to fully explore imperceptible clues between candidate objects and background surroundings. The former's intrinsic multi-head aggregation provides more diverse visual patterns. The latter's routing mechanism can effectively propagate inter-frame differences in spatiotemporal scenarios and be adaptively deactivated and output all-zero results for static representations. They provide a solid foundation for realizing a unified architecture for static and dynamic COD. Moreover, considering the uncertainty and ambiguity derived from indistinguishable textures, we construct a simple yet effective regularization, uncertainty awareness loss, to encourage predictions with higher confidence in candidate regions. Our highly task-friendly framework consistently outperforms existing state-of-the-art methods in image and video COD benchmarks. Our code can be found at {https://github.com/lartpang/ZoomNeXt}.
Related papers
- HEAP: Unsupervised Object Discovery and Localization with Contrastive
Grouping [29.678756772610797]
Unsupervised object discovery and localization aims to detect or segment objects in an image without any supervision.
Recent efforts have demonstrated a notable potential to identify salient foreground objects by utilizing self-supervised transformer features.
To address these problems, we introduce Hierarchical mErging framework via contrAstive grouPing (HEAP)
arXiv Detail & Related papers (2023-12-29T06:46:37Z) - COMICS: End-to-end Bi-grained Contrastive Learning for Multi-face Forgery Detection [56.7599217711363]
Face forgery recognition methods can only process one face at a time.
Most face forgery recognition methods can only process one face at a time.
We propose COMICS, an end-to-end framework for multi-face forgery detection.
arXiv Detail & Related papers (2023-08-03T03:37:13Z) - A bioinspired three-stage model for camouflaged object detection [8.11866601771984]
We propose a three-stage model that enables coarse-to-fine segmentation in a single iteration.
Our model employs three decoders to sequentially process subsampled features, cropped features, and high-resolution original features.
Our network surpasses state-of-the-art CNN-based counterparts without unnecessary complexities.
arXiv Detail & Related papers (2023-05-22T02:01:48Z) - CamDiff: Camouflage Image Augmentation via Diffusion Model [83.35960536063857]
CamDiff is a novel approach to synthesize salient objects in camouflaged scenes.
We leverage the latent diffusion model to synthesize salient objects in camouflaged scenes.
Our approach enables flexible editing and efficient large-scale dataset generation at a low cost.
arXiv Detail & Related papers (2023-04-11T19:37:47Z) - High-resolution Iterative Feedback Network for Camouflaged Object
Detection [128.893782016078]
Spotting camouflaged objects that are visually assimilated into the background is tricky for object detection algorithms.
We aim to extract the high-resolution texture details to avoid the detail degradation that causes blurred vision in edges and boundaries.
We introduce a novel HitNet to refine the low-resolution representations by high-resolution features in an iterative feedback manner.
arXiv Detail & Related papers (2022-03-22T11:20:21Z) - Discovering Objects that Can Move [55.743225595012966]
We study the problem of object discovery -- separating objects from the background without manual labels.
Existing approaches utilize appearance cues, such as color, texture, and location, to group pixels into object-like regions.
We choose to focus on dynamic objects -- entities that can move independently in the world.
arXiv Detail & Related papers (2022-03-18T21:13:56Z) - Zoom In and Out: A Mixed-scale Triplet Network for Camouflaged Object
Detection [0.0]
We propose a mixed-scale triplet network, bf ZoomNet, which mimics the behavior of humans when observing vague images.
Specifically, our ZoomNet employs the zoom strategy to learn the discriminative mixed-scale semantics by the designed scale integration unit and hierarchical mixed-scale unit.
Our proposed highly task-friendly model consistently surpasses the existing 23 state-of-the-art methods on four public datasets.
arXiv Detail & Related papers (2022-03-05T09:13:52Z) - RICE: Refining Instance Masks in Cluttered Environments with Graph
Neural Networks [53.15260967235835]
We propose a novel framework that refines the output of such methods by utilizing a graph-based representation of instance masks.
We train deep networks capable of sampling smart perturbations to the segmentations, and a graph neural network, which can encode relations between objects, to evaluate the segmentations.
We demonstrate an application that uses uncertainty estimates generated by our method to guide a manipulator, leading to efficient understanding of cluttered scenes.
arXiv Detail & Related papers (2021-06-29T20:29:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.