GMC -- Geometric Multimodal Contrastive Representation Learning
- URL: http://arxiv.org/abs/2202.03390v2
- Date: Tue, 8 Feb 2022 07:44:00 GMT
- Title: GMC -- Geometric Multimodal Contrastive Representation Learning
- Authors: Petra Poklukar, Miguel Vasco, Hang Yin, Francisco S. Melo, Ana Paiva,
Danica Kragic
- Abstract summary: We present a novel representation learning method comprised of two main components.
We experimentally demonstrate that GMC representations are semantically rich and achieve state-of-the-art performance.
- Score: 26.437843775786856
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning representations of multimodal data that are both informative and
robust to missing modalities at test time remains a challenging problem due to
the inherent heterogeneity of data obtained from different channels. To address
it, we present a novel Geometric Multimodal Contrastive (GMC) representation
learning method comprised of two main components: i) a two-level architecture
consisting of modality-specific base encoder, allowing to process an arbitrary
number of modalities to an intermediate representation of fixed dimensionality,
and a shared projection head, mapping the intermediate representations to a
latent representation space; ii) a multimodal contrastive loss function that
encourages the geometric alignment of the learned representations. We
experimentally demonstrate that GMC representations are semantically rich and
achieve state-of-the-art performance with missing modality information on three
different learning problems including prediction and reinforcement learning
tasks.
Related papers
- Multi-modal Relation Distillation for Unified 3D Representation Learning [30.942281325891226]
Multi-modal Relation Distillation (MRD) is a tri-modal pre-training framework designed to distill reputable large Vision-Language Models (VLM) into 3D backbones.
MRD aims to capture both intra-relations within each modality as well as cross-relations between different modalities and produce more discriminative 3D shape representations.
arXiv Detail & Related papers (2024-07-19T03:43:48Z) - Multi-Grained Contrast for Data-Efficient Unsupervised Representation Learning [10.630297877530614]
We propose a novel Multi-Grained Contrast method (MGC) for unsupervised representation learning.
Specifically, we construct delicate multi-grained correspondences between positive views and then conduct multi-grained contrast by the correspondences to learn more general unsupervised representations.
Our method significantly outperforms the existing state-of-the-art methods on extensive downstream tasks, including object detection, instance segmentation, scene parsing, semantic segmentation and keypoint detection.
arXiv Detail & Related papers (2024-07-02T07:35:21Z) - Latent Functional Maps: a spectral framework for representation alignment [34.20582953800544]
We introduce a multi-purpose framework to the representation learning community, which allows to: (i) compare different spaces in an interpretable way and measure their intrinsic similarity; (ii) find correspondences between them, both in unsupervised and weakly supervised settings, and (iii) to effectively transfer representations between distinct spaces.
We validate our framework on various applications, ranging from stitching to retrieval tasks, and on multiple modalities, demonstrating that Latent Functional Maps can serve as a swiss-army knife for representation alignment.
arXiv Detail & Related papers (2024-06-20T10:43:28Z) - Semantically Consistent Multi-view Representation Learning [11.145085584637744]
We propose a novel Semantically Consistent Multi-view Representation Learning (SCMRL)
SCMRL excavates underlying multi-view semantic consensus information and utilize the information to guide the unified feature representation learning.
Compared with several state-of-the-art algorithms, extensive experiments demonstrate its superiority.
arXiv Detail & Related papers (2023-03-08T04:27:46Z) - Multi-modal Contrastive Representation Learning for Entity Alignment [57.92705405276161]
Multi-modal entity alignment aims to identify equivalent entities between two different multi-modal knowledge graphs.
We propose MCLEA, a Multi-modal Contrastive Learning based Entity Alignment model.
In particular, MCLEA firstly learns multiple individual representations from multiple modalities, and then performs contrastive learning to jointly model intra-modal and inter-modal interactions.
arXiv Detail & Related papers (2022-09-02T08:59:57Z) - Variational Distillation for Multi-View Learning [104.17551354374821]
We design several variational information bottlenecks to exploit two key characteristics for multi-view representation learning.
Under rigorously theoretical guarantee, our approach enables IB to grasp the intrinsic correlation between observations and semantic labels.
arXiv Detail & Related papers (2022-06-20T03:09:46Z) - Deep Partial Multi-View Learning [94.39367390062831]
We propose a novel framework termed Cross Partial Multi-View Networks (CPM-Nets)
We fifirst provide a formal defifinition of completeness and versatility for multi-view representation.
We then theoretically prove the versatility of the learned latent representations.
arXiv Detail & Related papers (2020-11-12T02:29:29Z) - Dynamic Dual-Attentive Aggregation Learning for Visible-Infrared Person
Re-Identification [208.1227090864602]
Visible-infrared person re-identification (VI-ReID) is a challenging cross-modality pedestrian retrieval problem.
Existing VI-ReID methods tend to learn global representations, which have limited discriminability and weak robustness to noisy images.
We propose a novel dynamic dual-attentive aggregation (DDAG) learning method by mining both intra-modality part-level and cross-modality graph-level contextual cues for VI-ReID.
arXiv Detail & Related papers (2020-07-18T03:08:13Z) - Generative Partial Multi-View Clustering [133.36721417531734]
We propose a generative partial multi-view clustering model, named as GP-MVC, to address the incomplete multi-view problem.
First, multi-view encoder networks are trained to learn common low-dimensional representations, followed by a clustering layer to capture the consistent cluster structure across multiple views.
Second, view-specific generative adversarial networks are developed to generate the missing data of one view conditioning on the shared representation given by other views.
arXiv Detail & Related papers (2020-03-29T17:48:27Z) - Unpaired Multi-modal Segmentation via Knowledge Distillation [77.39798870702174]
We propose a novel learning scheme for unpaired cross-modality image segmentation.
In our method, we heavily reuse network parameters, by sharing all convolutional kernels across CT and MRI.
We have extensively validated our approach on two multi-class segmentation problems.
arXiv Detail & Related papers (2020-01-06T20:03:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.