Factorized Contrastive Learning: Going Beyond Multi-view Redundancy
- URL: http://arxiv.org/abs/2306.05268v2
- Date: Mon, 30 Oct 2023 05:31:05 GMT
- Title: Factorized Contrastive Learning: Going Beyond Multi-view Redundancy
- Authors: Paul Pu Liang, Zihao Deng, Martin Ma, James Zou, Louis-Philippe
Morency, Ruslan Salakhutdinov
- Abstract summary: This paper proposes FactorCL, a new multimodal representation learning method to go beyond multi-view redundancy.
On large-scale real-world datasets, FactorCL captures both shared and unique information and achieves state-of-the-art results.
- Score: 116.25342513407173
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In a wide range of multimodal tasks, contrastive learning has become a
particularly appealing approach since it can successfully learn representations
from abundant unlabeled data with only pairing information (e.g., image-caption
or video-audio pairs). Underpinning these approaches is the assumption of
multi-view redundancy - that shared information between modalities is necessary
and sufficient for downstream tasks. However, in many real-world settings,
task-relevant information is also contained in modality-unique regions:
information that is only present in one modality but still relevant to the
task. How can we learn self-supervised multimodal representations to capture
both shared and unique information relevant to downstream tasks? This paper
proposes FactorCL, a new multimodal representation learning method to go beyond
multi-view redundancy. FactorCL is built from three new contributions: (1)
factorizing task-relevant information into shared and unique representations,
(2) capturing task-relevant information via maximizing MI lower bounds and
removing task-irrelevant information via minimizing MI upper bounds, and (3)
multimodal data augmentations to approximate task relevance without labels. On
large-scale real-world datasets, FactorCL captures both shared and unique
information and achieves state-of-the-art results on six benchmarks
Related papers
- Distribution Matching for Multi-Task Learning of Classification Tasks: a
Large-Scale Study on Faces & Beyond [62.406687088097605]
Multi-Task Learning (MTL) is a framework, where multiple related tasks are learned jointly and benefit from a shared representation space.
We show that MTL can be successful with classification tasks with little, or non-overlapping annotations.
We propose a novel approach, where knowledge exchange is enabled between the tasks via distribution matching.
arXiv Detail & Related papers (2024-01-02T14:18:11Z) - MESED: A Multi-modal Entity Set Expansion Dataset with Fine-grained
Semantic Classes and Hard Negative Entities [25.059177235004952]
We propose Multi-modal Entity Set Expansion (MESE), where models integrate information from multiple modalities to represent entities.
A powerful multi-modal model MultiExpan is proposed which is pre-trained on four multimodal pre-training tasks.
The MESED dataset is the first multi-modal dataset for ESE with large-scale and elaborate manual calibration.
arXiv Detail & Related papers (2023-07-27T14:09:59Z) - An Efficient General-Purpose Modular Vision Model via Multi-Task
Heterogeneous Training [79.78201886156513]
We present a model that can perform multiple vision tasks and can be adapted to other downstream tasks efficiently.
Our approach achieves comparable results to single-task state-of-the-art models and demonstrates strong generalization on downstream tasks.
arXiv Detail & Related papers (2023-06-29T17:59:57Z) - Multimodal Learning Without Labeled Multimodal Data: Guarantees and Applications [90.6849884683226]
We study the challenge of interaction quantification in a semi-supervised setting with only labeled unimodal data.
Using a precise information-theoretic definition of interactions, our key contribution is the derivation of lower and upper bounds.
We show how these theoretical results can be used to estimate multimodal model performance, guide data collection, and select appropriate multimodal models for various tasks.
arXiv Detail & Related papers (2023-06-07T15:44:53Z) - Leveraging sparse and shared feature activations for disentangled
representation learning [112.22699167017471]
We propose to leverage knowledge extracted from a diversified set of supervised tasks to learn a common disentangled representation.
We validate our approach on six real world distribution shift benchmarks, and different data modalities.
arXiv Detail & Related papers (2023-04-17T01:33:24Z) - Learning Multimodal Data Augmentation in Feature Space [65.54623807628536]
LeMDA is an easy-to-use method that automatically learns to jointly augment multimodal data in feature space.
We show that LeMDA can profoundly improve the performance of multimodal deep learning architectures.
arXiv Detail & Related papers (2022-12-29T20:39:36Z) - Multimodal Information Bottleneck: Learning Minimal Sufficient Unimodal
and Multimodal Representations [27.855467591358018]
We introduce the multimodal information bottleneck (MIB), aiming to learn a powerful and sufficient multimodal representation.
We develop three MIB variants, namely, early-fusion MIB, late-fusion MIB, and complete MIB, to focus on different perspectives of information constraints.
Experimental results suggest that the proposed method reaches state-of-the-art performance on the tasks of multimodal sentiment analysis and multimodal emotion recognition.
arXiv Detail & Related papers (2022-10-31T16:14:18Z) - Sequential Cross Attention Based Multi-task Learning [22.430705836627148]
We propose a novel architecture that effectively transfers informative features by applying the attention mechanism to the multi-scale features of the tasks.
Our method achieves state-of-the-art performance on the NYUD-v2 and PASCAL-Context dataset.
arXiv Detail & Related papers (2022-09-06T14:17:33Z) - Context-Aware Multi-Task Learning for Traffic Scene Recognition in
Autonomous Vehicles [10.475998113861895]
We propose an algorithm to jointly learn the task-specific and shared representations by adopting a multi-task learning network.
Experiments on the large-scale dataset HSD demonstrate the effectiveness and superiority of our network over state-of-the-art methods.
arXiv Detail & Related papers (2020-04-03T03:09:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.