Related papers: A Mathematical Perspective On Contrastive Learning

A Mathematical Perspective On Contrastive Learning

URL: http://arxiv.org/abs/2505.24134v1
Date: Fri, 30 May 2025 02:09:37 GMT
Title: A Mathematical Perspective On Contrastive Learning
Authors: Ricardo Baptista, Andrew M. Stuart, Son Tran,
Abstract summary: Multimodal contrastive learning is a methodology for linking different data modalities.<n>We focus on the bimodal setting and interpret contrastive learning as the optimization of encoders that define conditional probability distributions.
Score: 5.66952471288857
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multimodal contrastive learning is a methodology for linking different data modalities; the canonical example is linking image and text data. The methodology is typically framed as the identification of a set of encoders, one for each modality, that align representations within a common latent space. In this work, we focus on the bimodal setting and interpret contrastive learning as the optimization of (parameterized) encoders that define conditional probability distributions, for each modality conditioned on the other, consistent with the available data. This provides a framework for multimodal algorithms such as crossmodal retrieval, which identifies the mode of one of these conditional distributions, and crossmodal classification, which is similar to retrieval but includes a fine-tuning step to make it task specific. The framework we adopt also gives rise to crossmodal generative models. This probabilistic perspective suggests two natural generalizations of contrastive learning: the introduction of novel probabilistic loss functions, and the use of alternative metrics for measuring alignment in the common latent space. We study these generalizations of the classical approach in the multivariate Gaussian setting. In this context we view the latent space identification as a low-rank matrix approximation problem. This allows us to characterize the capabilities of loss functions and alignment metrics to approximate natural statistics, such as conditional means and covariances; doing so yields novel variants on contrastive learning algorithms for specific mode-seeking and for generative tasks. The framework we introduce is also studied through numerical experiments on multivariate Gaussians, the labeled MNIST dataset, and on a data assimilation application arising in oceanography.

Related papers

Multimodal Representation Alignment for Cross-modal Information Retrieval [12.42313654539524]
Different machine learning models can represent the same underlying concept in different ways.<n>This variability is particularly valuable for in-the-wild multimodal retrieval, where the objective is to identify the corresponding representation in one modality given another as input.<n>In this work, we first investigate the geometric relationships between visual and textual embeddings derived from both vision-language models and combined unimodal models.<n>We then align these representations using four standard similarity metrics as well as two learned ones, implemented via neural networks.
arXiv Detail & Related papers (2025-06-10T13:16:26Z)
Learning local neighborhoods of non-Gaussian graphical models: A measure transport approach [0.3749861135832072]
We propose a scalable algorithm to infer the conditional independence relationships of each variable by exploiting the local Markov property.<n>The proposed method, named Localized Sparsity Identification for Non-Gaussian Distributions (L-SING), estimates the graph by using flexible classes of transport maps.
arXiv Detail & Related papers (2025-03-18T04:53:22Z)
Collaborative Heterogeneous Causal Inference Beyond Meta-analysis [68.4474531911361]
We propose a collaborative inverse propensity score estimator for causal inference with heterogeneous data. Our method shows significant improvements over the methods based on meta-analysis when heterogeneity increases.
arXiv Detail & Related papers (2024-04-24T09:04:36Z)
RGM: A Robust Generalizable Matching Model [49.60975442871967]
We propose a deep model for sparse and dense matching, termed RGM (Robust Generalist Matching) To narrow the gap between synthetic training samples and real-world scenarios, we build a new, large-scale dataset with sparse correspondence ground truth. We are able to mix up various dense and sparse matching datasets, significantly improving the training diversity.
arXiv Detail & Related papers (2023-10-18T07:30:08Z)
The Normalized Cross Density Functional: A Framework to Quantify Statistical Dependence for Random Processes [6.625320950808605]
We present a novel approach to measuring statistical dependence between two random processes (r.p.) using a positive-definite function called the Normalized Cross Density (NCD) NCD is derived directly from the probability density functions of two r.p. and constructs a data-dependent Hilbert space, the Normalized Cross-Density Hilbert Space (NCD-HS) We mathematically prove that FMCA learns the dominant eigenvalues and eigenfunctions of NCD directly from realizations.
arXiv Detail & Related papers (2022-12-09T02:12:41Z)
Learning to Bound Counterfactual Inference in Structural Causal Models from Observational and Randomised Data [64.96984404868411]
We derive a likelihood characterisation for the overall data that leads us to extend a previous EM-based algorithm. The new algorithm learns to approximate the (unidentifiability) region of model parameters from such mixed data sources. It delivers interval approximations to counterfactual results, which collapse to points in the identifiable case.
arXiv Detail & Related papers (2022-12-06T12:42:11Z)
Multimodal Data Fusion in High-Dimensional Heterogeneous Datasets via Generative Models [16.436293069942312]
We are interested in learning probabilistic generative models from high-dimensional heterogeneous data in an unsupervised fashion. We propose a general framework that combines disparate data types through the exponential family of distributions. The proposed algorithm is presented in detail for the commonly encountered heterogeneous datasets with real-valued (Gaussian) and categorical (multinomial) features.
arXiv Detail & Related papers (2021-08-27T18:10:31Z)
Integrating Information Theory and Adversarial Learning for Cross-modal Retrieval [19.600581093189362]
Accurately matching visual and textual data in cross-modal retrieval has been widely studied in the multimedia community. We propose integrating Shannon information theory and adversarial learning. In terms of the gap, we integrate modality classification and information entropy adversarially.
arXiv Detail & Related papers (2021-04-11T11:04:55Z)
Learning with Density Matrices and Random Features [44.98964870180375]
A density matrix describes the statistical state of a quantum system. It is a powerful formalism to represent both the quantum and classical uncertainty of quantum systems. This paper explores how density matrices can be used as a building block for machine learning models.
arXiv Detail & Related papers (2021-02-08T17:54:59Z)
Learning while Respecting Privacy and Robustness to Distributional Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model. The objective is to endow the trained model with robustness against adversarially manipulated input data. Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z)
Bayesian Sparse Factor Analysis with Kernelized Observations [67.60224656603823]
Multi-view problems can be faced with latent variable models. High-dimensionality and non-linear issues are traditionally handled by kernel methods. We propose merging both approaches into single model.
arXiv Detail & Related papers (2020-06-01T14:25:38Z)
Asymptotic Analysis of an Ensemble of Randomly Projected Linear Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets. We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator. We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.