Global-and-Local Collaborative Learning for Co-Salient Object Detection
- URL: http://arxiv.org/abs/2204.08917v1
- Date: Tue, 19 Apr 2022 14:32:41 GMT
- Title: Global-and-Local Collaborative Learning for Co-Salient Object Detection
- Authors: Runmin Cong, Ning Yang, Chongyi Li, Huazhu Fu, Yao Zhao, Qingming
Huang, Sam Kwong
- Abstract summary: The goal of co-salient object detection (CoSOD) is to discover salient objects that commonly appear in a query group containing two or more relevant images.
We propose a global-and-local collaborative learning architecture, which includes a global correspondence modeling (GCM) and a local correspondence modeling (LCM)
The proposed GLNet is evaluated on three prevailing CoSOD benchmark datasets, demonstrating that our model trained on a small dataset (about 3k images) still outperforms eleven state-of-the-art competitors trained on some large datasets (about 8k-200k images)
- Score: 162.62642867056385
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The goal of co-salient object detection (CoSOD) is to discover salient
objects that commonly appear in a query group containing two or more relevant
images. Therefore, how to effectively extract inter-image correspondence is
crucial for the CoSOD task. In this paper, we propose a global-and-local
collaborative learning architecture, which includes a global correspondence
modeling (GCM) and a local correspondence modeling (LCM) to capture
comprehensive inter-image corresponding relationship among different images
from the global and local perspectives. Firstly, we treat different images as
different time slices and use 3D convolution to integrate all intra features
intuitively, which can more fully extract the global group semantics. Secondly,
we design a pairwise correlation transformation (PCT) to explore similarity
correspondence between pairwise images and combine the multiple local pairwise
correspondences to generate the local inter-image relationship. Thirdly, the
inter-image relationships of the GCM and LCM are integrated through a
global-and-local correspondence aggregation (GLA) module to explore more
comprehensive inter-image collaboration cues. Finally, the intra- and
inter-features are adaptively integrated by an intra-and-inter weighting fusion
(AEWF) module to learn co-saliency features and predict the co-saliency map.
The proposed GLNet is evaluated on three prevailing CoSOD benchmark datasets,
demonstrating that our model trained on a small dataset (about 3k images) still
outperforms eleven state-of-the-art competitors trained on some large datasets
(about 8k-200k images).
Related papers
- Multi-Grained Multimodal Interaction Network for Entity Linking [65.30260033700338]
Multimodal entity linking task aims at resolving ambiguous mentions to a multimodal knowledge graph.
We propose a novel Multi-GraIned Multimodal InteraCtion Network $textbf(MIMIC)$ framework for solving the MEL task.
arXiv Detail & Related papers (2023-07-19T02:11:19Z) - PointCMC: Cross-Modal Multi-Scale Correspondences Learning for Point
Cloud Understanding [0.875967561330372]
Cross-modal method to model multi-scale correspondences across modalities for self-supervised point cloud representation learning.
PointCMC is composed of: (1) a local-to-local (L2L) module that learns local correspondences through optimized cross-modal local geometric features, (2) a local-to-global (L2G) module that aims to learn the correspondences between local and global features across modalities via local-global discrimination, and (3) a global-to-global (G2G) module, which leverages auxiliary global contrastive loss between the point cloud and image to learn high-level semantic correspondences.
arXiv Detail & Related papers (2022-11-22T06:08:43Z) - Deep Relational Metric Learning [84.95793654872399]
This paper presents a deep relational metric learning framework for image clustering and retrieval.
We learn an ensemble of features that characterizes an image from different aspects to model both interclass and intraclass distributions.
Experiments on the widely-used CUB-200-2011, Cars196, and Stanford Online Products datasets demonstrate that our framework improves existing deep metric learning methods and achieves very competitive results.
arXiv Detail & Related papers (2021-08-23T09:31:18Z) - Similarity-Aware Fusion Network for 3D Semantic Segmentation [87.51314162700315]
We propose a similarity-aware fusion network (SAFNet) to adaptively fuse 2D images and 3D point clouds for 3D semantic segmentation.
We employ a late fusion strategy where we first learn the geometric and contextual similarities between the input and back-projected (from 2D pixels) point clouds.
We show that SAFNet significantly outperforms existing state-of-the-art fusion-based approaches across various data integrity.
arXiv Detail & Related papers (2021-07-04T09:28:18Z) - DF^2AM: Dual-level Feature Fusion and Affinity Modeling for RGB-Infrared
Cross-modality Person Re-identification [18.152310122348393]
RGB-infrared person re-identification is a challenging task due to the intra-class variations and cross-modality discrepancy.
We propose a Dual-level (i.e., local and global) Feature Fusion (DF2) module by learning attention for discnative feature from local to global manner.
To further mining the relationships between global features from person images, we propose an Affinities Modeling (AM) module.
arXiv Detail & Related papers (2021-04-01T03:12:56Z) - CoADNet: Collaborative Aggregation-and-Distribution Networks for
Co-Salient Object Detection [91.91911418421086]
Co-Salient Object Detection (CoSOD) aims at discovering salient objects that repeatedly appear in a given query group containing two or more relevant images.
One challenging issue is how to effectively capture co-saliency cues by modeling and exploiting inter-image relationships.
We present an end-to-end collaborative aggregation-and-distribution network (CoADNet) to capture both salient and repetitive visual patterns from multiple images.
arXiv Detail & Related papers (2020-11-10T04:28:11Z) - Cascaded Human-Object Interaction Recognition [175.60439054047043]
We introduce a cascade architecture for a multi-stage, coarse-to-fine HOI understanding.
At each stage, an instance localization network progressively refines HOI proposals and feeds them into an interaction recognition network.
With our carefully-designed human-centric relation features, these two modules work collaboratively towards effective interaction understanding.
arXiv Detail & Related papers (2020-03-09T17:05:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.