Co-Salient Object Detection with Semantic-Level Consensus Extraction and
Dispersion
- URL: http://arxiv.org/abs/2309.07753v1
- Date: Thu, 14 Sep 2023 14:39:07 GMT
- Title: Co-Salient Object Detection with Semantic-Level Consensus Extraction and
Dispersion
- Authors: Peiran Xu, Yadong Mu
- Abstract summary: Co-salient object detection aims to highlight the common salient object in each image.
We propose a hierarchical Transformer module for extracting semantic-level consensus.
A Transformer-based dispersion module takes into account the variation of the co-salient object in different scenes.
- Score: 27.120768849942145
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Given a group of images, co-salient object detection (CoSOD) aims to
highlight the common salient object in each image. There are two factors
closely related to the success of this task, namely consensus extraction, and
the dispersion of consensus to each image. Most previous works represent the
group consensus using local features, while we instead utilize a hierarchical
Transformer module for extracting semantic-level consensus. Therefore, it can
obtain a more comprehensive representation of the common object category, and
exclude interference from other objects that share local similarities with the
target object. In addition, we propose a Transformer-based dispersion module
that takes into account the variation of the co-salient object in different
scenes. It distributes the consensus to the image feature maps in an
image-specific way while making full use of interactions within the group.
These two modules are integrated with a ViT encoder and an FPN-like decoder to
form an end-to-end trainable network, without additional branch and auxiliary
loss. The proposed method is evaluated on three commonly used CoSOD datasets
and achieves state-of-the-art performance.
Related papers
- Towards Open-World Co-Salient Object Detection with Generative
Uncertainty-aware Group Selective Exchange-Masking [23.60044777118441]
We introduce a group selective exchange-masking (GSEM) approach for enhancing the robustness of the CoSOD model.
GSEM selects a subset of images from each group using a novel learning-based strategy, then the selected images are exchanged.
To simultaneously consider the uncertainty introduced by irrelevant images and the consensus features of the remaining relevant images in the group, we designed a latent variable generator branch and CoSOD transformer branch.
arXiv Detail & Related papers (2023-10-16T10:40:40Z) - De-coupling and De-positioning Dense Self-supervised Learning [65.56679416475943]
Dense Self-Supervised Learning (SSL) methods address the limitations of using image-level feature representations when handling images with multiple objects.
We show that they suffer from coupling and positional bias, which arise from the receptive field increasing with layer depth and zero-padding.
We demonstrate the benefits of our method on COCO and on a new challenging benchmark, OpenImage-MINI, for object classification, semantic segmentation, and object detection.
arXiv Detail & Related papers (2023-03-29T18:07:25Z) - Adaptive Graph Convolution Module for Salient Object Detection [7.278033100480174]
We propose an adaptive graph convolution module (AGCM) to deal with complex scenes.
Prototype features are extracted from the input image using a learnable region generation layer.
The proposed AGCM dramatically improves the SOD performance both quantitatively and quantitatively.
arXiv Detail & Related papers (2023-03-17T07:07:17Z) - Part-guided Relational Transformers for Fine-grained Visual Recognition [59.20531172172135]
We propose a framework to learn the discriminative part features and explore correlations with a feature transformation module.
Our proposed approach does not rely on additional part branches and reaches state-the-of-art performance on 3-of-the-level object recognition.
arXiv Detail & Related papers (2022-12-28T03:45:56Z) - Multi-Projection Fusion and Refinement Network for Salient Object
Detection in 360{\deg} Omnidirectional Image [141.10227079090419]
We propose a Multi-Projection Fusion and Refinement Network (MPFR-Net) to detect the salient objects in 360deg omnidirectional image.
MPFR-Net uses the equirectangular projection image and four corresponding cube-unfolding images as inputs.
Experimental results on two omnidirectional datasets demonstrate that the proposed approach outperforms the state-of-the-art methods both qualitatively and quantitatively.
arXiv Detail & Related papers (2022-12-23T14:50:40Z) - Global-and-Local Collaborative Learning for Co-Salient Object Detection [162.62642867056385]
The goal of co-salient object detection (CoSOD) is to discover salient objects that commonly appear in a query group containing two or more relevant images.
We propose a global-and-local collaborative learning architecture, which includes a global correspondence modeling (GCM) and a local correspondence modeling (LCM)
The proposed GLNet is evaluated on three prevailing CoSOD benchmark datasets, demonstrating that our model trained on a small dataset (about 3k images) still outperforms eleven state-of-the-art competitors trained on some large datasets (about 8k-200k images)
arXiv Detail & Related papers (2022-04-19T14:32:41Z) - Relationformer: A Unified Framework for Image-to-Graph Generation [18.832626244362075]
This work proposes a unified one-stage transformer-based framework, namely Relationformer, that jointly predicts objects and their relations.
We leverage direct set-based object prediction and incorporate the interaction among the objects to learn an object-relation representation jointly.
We achieve state-of-the-art performance on multiple, diverse and multi-domain datasets.
arXiv Detail & Related papers (2022-03-19T00:36:59Z) - A Unified Transformer Framework for Group-based Segmentation:
Co-Segmentation, Co-Saliency Detection and Video Salient Object Detection [59.21990697929617]
Humans tend to mine objects by learning from a group of images or several frames of video since we live in a dynamic world.
Previous approaches design different networks on similar tasks separately, and they are difficult to apply to each other.
We introduce a unified framework to tackle these issues, term as UFO (UnifiedObject Framework for Co-Object Framework)
arXiv Detail & Related papers (2022-03-09T13:35:19Z) - CoADNet: Collaborative Aggregation-and-Distribution Networks for
Co-Salient Object Detection [91.91911418421086]
Co-Salient Object Detection (CoSOD) aims at discovering salient objects that repeatedly appear in a given query group containing two or more relevant images.
One challenging issue is how to effectively capture co-saliency cues by modeling and exploiting inter-image relationships.
We present an end-to-end collaborative aggregation-and-distribution network (CoADNet) to capture both salient and repetitive visual patterns from multiple images.
arXiv Detail & Related papers (2020-11-10T04:28:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.