Understanding Multimodal Contrastive Learning and Incorporating Unpaired
Data
- URL: http://arxiv.org/abs/2302.06232v1
- Date: Mon, 13 Feb 2023 10:11:05 GMT
- Title: Understanding Multimodal Contrastive Learning and Incorporating Unpaired
Data
- Authors: Ryumei Nakada, Halil Ibrahim Gulluk, Zhun Deng, Wenlong Ji, James Zou,
Linjun Zhang
- Abstract summary: We show a general class of nonlinear loss functions for multimodal contrastive learning (MMCL)
We quantitatively show that the feature learning ability of MMCL can be better than that of unimodal contrastive learning applied to each modality.
When we have access to additional unpaired data, we propose a new MMCL loss that incorporates additional unpaired datasets.
- Score: 19.72282903349282
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Language-supervised vision models have recently attracted great attention in
computer vision. A common approach to build such models is to use contrastive
learning on paired data across the two modalities, as exemplified by
Contrastive Language-Image Pre-Training (CLIP). In this paper, under linear
representation settings, (i) we initiate the investigation of a general class
of nonlinear loss functions for multimodal contrastive learning (MMCL)
including CLIP loss and show its connection to singular value decomposition
(SVD). Namely, we show that each step of loss minimization by gradient descent
can be seen as performing SVD on a contrastive cross-covariance matrix. Based
on this insight, (ii) we analyze the performance of MMCL. We quantitatively
show that the feature learning ability of MMCL can be better than that of
unimodal contrastive learning applied to each modality even under the presence
of wrongly matched pairs. This characterizes the robustness of MMCL to noisy
data. Furthermore, when we have access to additional unpaired data, (iii) we
propose a new MMCL loss that incorporates additional unpaired datasets. We show
that the algorithm can detect the ground-truth pairs and improve performance by
fully exploiting unpaired datasets. The performance of the proposed algorithm
was verified by numerical experiments.
Related papers
- Dissecting Misalignment of Multimodal Large Language Models via Influence Function [12.832792175138241]
We introduce the Extended Influence Function for Contrastive Loss (ECIF), an influence function crafted for contrastive loss.
ECIF considers both positive and negative samples and provides a closed-form approximation of contrastive learning models.
Building upon ECIF, we develop a series of algorithms for data evaluation in MLLM, misalignment detection, and misprediction trace-back tasks.
arXiv Detail & Related papers (2024-11-18T15:45:41Z) - On Discriminative Probabilistic Modeling for Self-Supervised Representation Learning [85.75164588939185]
We study the discriminative probabilistic modeling problem on a continuous domain for (multimodal) self-supervised representation learning.
We conduct generalization error analysis to reveal the limitation of current InfoNCE-based contrastive loss for self-supervised representation learning.
arXiv Detail & Related papers (2024-10-11T18:02:46Z) - CoMMIT: Coordinated Instruction Tuning for Multimodal Large Language Models [68.64605538559312]
In this paper, we analyze the MLLM instruction tuning from both theoretical and empirical perspectives.
Inspired by our findings, we propose a measurement to quantitatively evaluate the learning balance.
In addition, we introduce an auxiliary loss regularization method to promote updating of the generation distribution of MLLMs.
arXiv Detail & Related papers (2024-07-29T23:18:55Z) - Entropy Law: The Story Behind Data Compression and LLM Performance [115.70395740286422]
We find that model performance is negatively correlated to the compression ratio of training data, which usually yields a lower training loss.
Based on the findings of the entropy law, we propose a quite efficient and universal data selection method.
We also present an interesting application of entropy law that can detect potential performance risks at the beginning of model training.
arXiv Detail & Related papers (2024-07-09T08:14:29Z) - Dataset Condensation with Latent Quantile Matching [5.466962214217334]
Current distribution matching (DM) based DC methods learn a synthesized dataset by matching the mean of the latent embeddings between the synthetic and the real outliers.
We propose Latent Quantile Matching (LQM) which matches the quantiles of the latent embeddings to minimize the goodness of fit test statistic between two distributions.
arXiv Detail & Related papers (2024-06-14T09:20:44Z) - Learning the Unlearned: Mitigating Feature Suppression in Contrastive Learning [45.25602203155762]
Self-Supervised Contrastive Learning has proven effective in deriving high-quality representations from unlabeled data.
A major challenge that hinders both unimodal and multimodal contrastive learning is feature suppression.
We propose a novel model-agnostic Multistage Contrastive Learning framework.
arXiv Detail & Related papers (2024-02-19T04:13:33Z) - Towards Better Modeling with Missing Data: A Contrastive Learning-based
Visual Analytics Perspective [7.577040836988683]
Missing data can pose a challenge for machine learning (ML) modeling.
Current approaches are categorized into feature imputation and label prediction.
This study proposes a Contrastive Learning framework to model observed data with missing values.
arXiv Detail & Related papers (2023-09-18T13:16:24Z) - On Modality Bias Recognition and Reduction [70.69194431713825]
We study the modality bias problem in the context of multi-modal classification.
We propose a plug-and-play loss function method, whereby the feature space for each label is adaptively learned.
Our method yields remarkable performance improvements compared with the baselines.
arXiv Detail & Related papers (2022-02-25T13:47:09Z) - Performance Indicator in Multilinear Compressive Learning [106.12874293597754]
The Multilinear Compressive Learning (MCL) framework was proposed to efficiently optimize the sensing and learning steps when working with multidimensional signals.
In this paper, we analyze the relationship between the input signal resolution, the number of compressed measurements and the learning performance of MCL.
arXiv Detail & Related papers (2020-09-22T11:27:50Z) - Provably Efficient Causal Reinforcement Learning with Confounded
Observational Data [135.64775986546505]
We study how to incorporate the dataset (observational data) collected offline, which is often abundantly available in practice, to improve the sample efficiency in the online setting.
We propose the deconfounded optimistic value iteration (DOVI) algorithm, which incorporates the confounded observational data in a provably efficient manner.
arXiv Detail & Related papers (2020-06-22T14:49:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.