A Simple yet Effective Network based on Vision Transformer for
Camouflaged Object and Salient Object Detection
- URL: http://arxiv.org/abs/2402.18922v1
- Date: Thu, 29 Feb 2024 07:29:28 GMT
- Title: A Simple yet Effective Network based on Vision Transformer for
Camouflaged Object and Salient Object Detection
- Authors: Chao Hao, Zitong Yu, Xin Liu, Jun Xu, Huanjing Yue, Jingyu Yang
- Abstract summary: We propose a simple yet effective network (SENet) based on vision Transformer (ViT)
To enhance the Transformer's ability to model local information, we propose a local information capture module (LICM)
We also propose a dynamic weighted loss (DW loss) based on Binary Cross-Entropy (BCE) and Intersection over Union (IoU) loss, which guides the network to pay more attention to those smaller and more difficult-to-find target objects.
- Score: 33.30644598646274
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Camouflaged object detection (COD) and salient object detection (SOD) are two
distinct yet closely-related computer vision tasks widely studied during the
past decades. Though sharing the same purpose of segmenting an image into
binary foreground and background regions, their distinction lies in the fact
that COD focuses on concealed objects hidden in the image, while SOD
concentrates on the most prominent objects in the image. Previous works
achieved good performance by stacking various hand-designed modules and
multi-scale features. However, these carefully-designed complex networks often
performed well on one task but not on another. In this work, we propose a
simple yet effective network (SENet) based on vision Transformer (ViT), by
employing a simple design of an asymmetric ViT-based encoder-decoder structure,
we yield competitive results on both tasks, exhibiting greater versatility than
meticulously crafted ones. Furthermore, to enhance the Transformer's ability to
model local information, which is important for pixel-level binary segmentation
tasks, we propose a local information capture module (LICM). We also propose a
dynamic weighted loss (DW loss) based on Binary Cross-Entropy (BCE) and
Intersection over Union (IoU) loss, which guides the network to pay more
attention to those smaller and more difficult-to-find target objects according
to their size. Moreover, we explore the issue of joint training of SOD and COD,
and propose a preliminary solution to the conflict in joint training, further
improving the performance of SOD. Extensive experiments on multiple benchmark
datasets demonstrate the effectiveness of our method. The code is available at
https://github.com/linuxsino/SENet.
Related papers
- SCLNet: A Scale-Robust Complementary Learning Network for Object Detection in UAV Images [0.0]
This paper introduces a scale-robust complementary learning network (SCLNet) to address the scale challenges.
One implementation is based on our proposed scale-complementary decoder and scale-complementary loss function.
Another implementation is based on our proposed contrastive complement network and contrastive complement loss function.
arXiv Detail & Related papers (2024-09-11T05:39:25Z) - Camouflaged Object Detection with Feature Grafting and Distractor Aware [9.791590363932519]
We propose a novel Feature Grafting and Distractor Aware network (FDNet) to handle the Camouflaged Object Detection task.
Specifically, we use CNN and Transformer to encode multi-scale images in parallel.
A Distractor Aware Module is designed to explicitly model the two possible distractors in the COD task to refine the coarse camouflage map.
arXiv Detail & Related papers (2023-07-08T09:37:08Z) - De-coupling and De-positioning Dense Self-supervised Learning [65.56679416475943]
Dense Self-Supervised Learning (SSL) methods address the limitations of using image-level feature representations when handling images with multiple objects.
We show that they suffer from coupling and positional bias, which arise from the receptive field increasing with layer depth and zero-padding.
We demonstrate the benefits of our method on COCO and on a new challenging benchmark, OpenImage-MINI, for object classification, semantic segmentation, and object detection.
arXiv Detail & Related papers (2023-03-29T18:07:25Z) - Unleash the Potential of Image Branch for Cross-modal 3D Object
Detection [67.94357336206136]
We present a new cross-modal 3D object detector, namely UPIDet, which aims to unleash the potential of the image branch from two aspects.
First, UPIDet introduces a new 2D auxiliary task called normalized local coordinate map estimation.
Second, we discover that the representational capability of the point cloud backbone can be enhanced through the gradients backpropagated from the training objectives of the image branch.
arXiv Detail & Related papers (2023-01-22T08:26:58Z) - Visual Transformer for Object Detection [0.0]
We consider the use of self-attention for discriminative visual tasks, object detection, as an alternative to convolutions.
Our model leads to consistent improvements in object detection on COCO across many different models and scales.
arXiv Detail & Related papers (2022-06-01T06:13:09Z) - Multitask AET with Orthogonal Tangent Regularity for Dark Object
Detection [84.52197307286681]
We propose a novel multitask auto encoding transformation (MAET) model to enhance object detection in a dark environment.
In a self-supervision manner, the MAET learns the intrinsic visual structure by encoding and decoding the realistic illumination-degrading transformation.
We have achieved the state-of-the-art performance using synthetic and real-world datasets.
arXiv Detail & Related papers (2022-05-06T16:27:14Z) - An Extendable, Efficient and Effective Transformer-based Object Detector [95.06044204961009]
We integrate Vision and Detection Transformers (ViDT) to construct an effective and efficient object detector.
ViDT introduces a reconfigured attention module to extend the recent Swin Transformer to be a standalone object detector.
We extend it to ViDT+ to support joint-task learning for object detection and instance segmentation.
arXiv Detail & Related papers (2022-04-17T09:27:45Z) - A Unified Transformer Framework for Group-based Segmentation:
Co-Segmentation, Co-Saliency Detection and Video Salient Object Detection [59.21990697929617]
Humans tend to mine objects by learning from a group of images or several frames of video since we live in a dynamic world.
Previous approaches design different networks on similar tasks separately, and they are difficult to apply to each other.
We introduce a unified framework to tackle these issues, term as UFO (UnifiedObject Framework for Co-Object Framework)
arXiv Detail & Related papers (2022-03-09T13:35:19Z) - CoSformer: Detecting Co-Salient Object with Transformers [2.3148470932285665]
Co-Salient Object Detection (CoSOD) aims at simulating the human visual system to discover the common and salient objects from a group of relevant images.
We propose the Co-Salient Object Detection Transformer (CoSformer) network to capture both salient and common visual patterns from multiple images.
arXiv Detail & Related papers (2021-04-30T02:39:12Z) - Suppress and Balance: A Simple Gated Network for Salient Object
Detection [89.88222217065858]
We propose a simple gated network (GateNet) to solve both issues at once.
With the help of multilevel gate units, the valuable context information from the encoder can be optimally transmitted to the decoder.
In addition, we adopt the atrous spatial pyramid pooling based on the proposed "Fold" operation (Fold-ASPP) to accurately localize salient objects of various scales.
arXiv Detail & Related papers (2020-07-16T02:00:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.