Discriminative Co-Saliency and Background Mining Transformer for
Co-Salient Object Detection
- URL: http://arxiv.org/abs/2305.00514v2
- Date: Sat, 6 May 2023 01:59:52 GMT
- Title: Discriminative Co-Saliency and Background Mining Transformer for
Co-Salient Object Detection
- Authors: Long Li, Junwei Han, Ni Zhang, Nian Liu, Salman Khan, Hisham
Cholakkal, Rao Muhammad Anwer, and Fahad Shahbaz Khan
- Abstract summary: We propose a Discriminative co-saliency and background Mining Transformer framework (DMT)
We use two types of pre-defined tokens to mine co-saliency and background information via our proposed contrast-induced pixel-to-token correlation and co-saliency token-to-token correlation modules.
Experimental results on three benchmark datasets demonstrate the effectiveness of our proposed method.
- Score: 111.04994415248736
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Most previous co-salient object detection works mainly focus on extracting
co-salient cues via mining the consistency relations across images while
ignoring explicit exploration of background regions. In this paper, we propose
a Discriminative co-saliency and background Mining Transformer framework (DMT)
based on several economical multi-grained correlation modules to explicitly
mine both co-saliency and background information and effectively model their
discrimination. Specifically, we first propose a region-to-region correlation
module for introducing inter-image relations to pixel-wise segmentation
features while maintaining computational efficiency. Then, we use two types of
pre-defined tokens to mine co-saliency and background information via our
proposed contrast-induced pixel-to-token correlation and co-saliency
token-to-token correlation modules. We also design a token-guided feature
refinement module to enhance the discriminability of the segmentation features
under the guidance of the learned tokens. We perform iterative mutual promotion
for the segmentation feature extraction and token construction. Experimental
results on three benchmark datasets demonstrate the effectiveness of our
proposed method. The source code is available at:
https://github.com/dragonlee258079/DMT.
Related papers
- Learning Invariant Inter-pixel Correlations for Superpixel Generation [12.605604620139497]
Learnable features exhibit constrained discriminative capability, resulting in unsatisfactory pixel grouping performance.
We propose the Content Disentangle Superpixel algorithm to selectively separate the invariant inter-pixel correlations and statistical properties.
The experimental results on four benchmark datasets demonstrate the superiority of our approach to existing state-of-the-art methods.
arXiv Detail & Related papers (2024-02-28T09:46:56Z) - Multi-scale Target-Aware Framework for Constrained Image Splicing
Detection and Localization [11.803255600587308]
We propose a multi-scale target-aware framework to couple feature extraction and correlation matching in a unified pipeline.
Our approach can effectively promote the collaborative learning of related patches, and perform mutual promotion of feature learning and correlation matching.
Our experiments demonstrate that our model, which uses a unified pipeline, outperforms state-of-the-art methods on several benchmark datasets.
arXiv Detail & Related papers (2023-08-18T07:38:30Z) - Object Segmentation by Mining Cross-Modal Semantics [68.88086621181628]
We propose a novel approach by mining the Cross-Modal Semantics to guide the fusion and decoding of multimodal features.
Specifically, we propose a novel network, termed XMSNet, consisting of (1) all-round attentive fusion (AF), (2) coarse-to-fine decoder (CFD), and (3) cross-layer self-supervision.
arXiv Detail & Related papers (2023-05-17T14:30:11Z) - FECANet: Boosting Few-Shot Semantic Segmentation with Feature-Enhanced
Context-Aware Network [48.912196729711624]
Few-shot semantic segmentation is the task of learning to locate each pixel of a novel class in a query image with only a few annotated support images.
We propose a Feature-Enhanced Context-Aware Network (FECANet) to suppress the matching noise caused by inter-class local similarity.
In addition, we propose a novel correlation reconstruction module that encodes extra correspondence relations between foreground and background and multi-scale context semantic features.
arXiv Detail & Related papers (2023-01-19T16:31:13Z) - Part-guided Relational Transformers for Fine-grained Visual Recognition [59.20531172172135]
We propose a framework to learn the discriminative part features and explore correlations with a feature transformation module.
Our proposed approach does not rely on additional part branches and reaches state-the-of-art performance on 3-of-the-level object recognition.
arXiv Detail & Related papers (2022-12-28T03:45:56Z) - A Tri-attention Fusion Guided Multi-modal Segmentation Network [2.867517731896504]
We propose a multi-modality segmentation network guided by a novel tri-attention fusion.
Our network includes N model-independent encoding paths with N image sources, a tri-attention fusion block, a dual-attention fusion block, and a decoding path.
Our experiment results tested on BraTS 2018 dataset for brain tumor segmentation demonstrate the effectiveness of our proposed method.
arXiv Detail & Related papers (2021-11-02T14:36:53Z) - 3D Medical Multi-modal Segmentation Network Guided by Multi-source
Correlation Constraint [2.867517731896504]
We propose a multi-modality segmentation network with a correlation constraint.
Our experiment results tested on BraTS-2018 dataset for brain tumor segmentation demonstrate the effectiveness of our proposed method.
arXiv Detail & Related papers (2021-02-05T11:23:12Z) - Gradient-Induced Co-Saliency Detection [81.54194063218216]
Co-saliency detection (Co-SOD) aims to segment the common salient foreground in a group of relevant images.
In this paper, inspired by human behavior, we propose a gradient-induced co-saliency detection method.
arXiv Detail & Related papers (2020-04-28T08:40:55Z) - Bidirectional Graph Reasoning Network for Panoptic Segmentation [126.06251745669107]
We introduce a Bidirectional Graph Reasoning Network (BGRNet) to mine the intra-modular and intermodular relations within and between foreground things and background stuff classes.
BGRNet first constructs image-specific graphs in both instance and semantic segmentation branches that enable flexible reasoning at the proposal level and class level.
arXiv Detail & Related papers (2020-04-14T02:32:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.