Explicit Mutual Information Maximization for Self-Supervised Learning
- URL: http://arxiv.org/abs/2409.04747v3
- Date: Thu, 12 Sep 2024 16:00:08 GMT
- Title: Explicit Mutual Information Maximization for Self-Supervised Learning
- Authors: Lele Chang, Peilin Liu, Qinghai Guo, Fei Wen,
- Abstract summary: Theoretically, mutual information (MIM) is an optimal criterion for self-supervised learning (SSL)
This work shows that, based on the explicit in property of MI, MI can be applied to SSL under a generic distribution assumption.
We derive a loss function based on the MIM criterion using only second-order statistics.
- Score: 23.41734709882332
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, self-supervised learning (SSL) has been extensively studied. Theoretically, mutual information maximization (MIM) is an optimal criterion for SSL, with a strong theoretical foundation in information theory. However, it is difficult to directly apply MIM in SSL since the data distribution is not analytically available in applications. In practice, many existing methods can be viewed as approximate implementations of the MIM criterion. This work shows that, based on the invariance property of MI, explicit MI maximization can be applied to SSL under a generic distribution assumption, i.e., a relaxed condition of the data distribution. We further illustrate this by analyzing the generalized Gaussian distribution. Based on this result, we derive a loss function based on the MIM criterion using only second-order statistics. We implement the new loss for SSL and demonstrate its effectiveness via extensive experiments.
Related papers
- A Probabilistic Model for Self-Supervised Learning [6.178817969919849]
Self-supervised learning (SSL) aims to find meaningful representations from unlabeled data by encoding semantic similarities through data augmentations.
It is not yet known whether commonly used SSL loss functions can be related to a statistical model.
We consider a latent variable statistical model for SSL that exhibits an interesting property: Depending on the informativeness of the data augmentations, the MLE of the model either reduces to PCA, or approaches a simple non-contrastive loss.
arXiv Detail & Related papers (2025-01-22T17:25:47Z) - Analysis of High-dimensional Gaussian Labeled-unlabeled Mixture Model via Message-passing Algorithm [3.192109204993465]
Semi-supervised learning (SSL) is a machine learning methodology that leverages unlabeled data in conjunction with a limited amount of labeled data.
Some existing theoretical studies have attempted to address this issue by modeling classification problems using the so-called Gaussian Mixture Model (GMM)
In this paper, we conduct such a detailed analysis of the properties of the high-dimensional GMM for binary classification in the SSL setting.
arXiv Detail & Related papers (2024-11-29T08:57:07Z) - MaxMatch: Semi-Supervised Learning with Worst-Case Consistency [149.03760479533855]
We propose a worst-case consistency regularization technique for semi-supervised learning (SSL)
We present a generalization bound for SSL consisting of the empirical loss terms observed on labeled and unlabeled training data separately.
Motivated by this bound, we derive an SSL objective that minimizes the largest inconsistency between an original unlabeled sample and its multiple augmented variants.
arXiv Detail & Related papers (2022-09-26T12:04:49Z) - OpenLDN: Learning to Discover Novel Classes for Open-World
Semi-Supervised Learning [110.40285771431687]
Semi-supervised learning (SSL) is one of the dominant approaches to address the annotation bottleneck of supervised learning.
Recent SSL methods can effectively leverage a large repository of unlabeled data to improve performance while relying on a small set of labeled data.
This work introduces OpenLDN that utilizes a pairwise similarity loss to discover novel classes.
arXiv Detail & Related papers (2022-07-05T18:51:05Z) - Self-supervised Learning is More Robust to Dataset Imbalance [65.84339596595383]
We investigate self-supervised learning under dataset imbalance.
Off-the-shelf self-supervised representations are already more robust to class imbalance than supervised representations.
We devise a re-weighted regularization technique that consistently improves the SSL representation quality on imbalanced datasets.
arXiv Detail & Related papers (2021-10-11T06:29:56Z) - Counterfactual Maximum Likelihood Estimation for Training Deep Networks [83.44219640437657]
Deep learning models are prone to learning spurious correlations that should not be learned as predictive clues.
We propose a causality-based training framework to reduce the spurious correlations caused by observable confounders.
We conduct experiments on two real-world tasks: Natural Language Inference (NLI) and Image Captioning.
arXiv Detail & Related papers (2021-06-07T17:47:16Z) - Semi-supervised learning objectives as log-likelihoods in a generative
model of data curation [32.45282187405337]
We formulate SSL objectives as a log-likelihood in a generative model of data curation.
We give a proof-of-principle for Bayesian SSL on toy data.
arXiv Detail & Related papers (2020-08-13T13:50:27Z) - CLUB: A Contrastive Log-ratio Upper Bound of Mutual Information [105.73798100327667]
We propose a novel Contrastive Log-ratio Upper Bound (CLUB) of mutual information.
We provide a theoretical analysis of the properties of CLUB and its variational approximation.
Based on this upper bound, we introduce a MI minimization training scheme and further accelerate it with a negative sampling strategy.
arXiv Detail & Related papers (2020-06-22T05:36:16Z) - A Probabilistic Model for Discriminative and Neuro-Symbolic
Semi-Supervised Learning [6.789370732159177]
We present a probabilistic model for discriminative SSL, that mirrors its classical generative counterpart.
We show several well-known SSL methods can be interpreted as approximating this prior, and can be improved upon.
We extend the discriminative model to neuro-symbolic SSL, where label features satisfy logical rules, by showing such rules relate directly to the above prior.
arXiv Detail & Related papers (2020-06-10T15:30:54Z) - Modal Regression based Structured Low-rank Matrix Recovery for
Multi-view Learning [70.57193072829288]
Low-rank Multi-view Subspace Learning has shown great potential in cross-view classification in recent years.
Existing LMvSL based methods are incapable of well handling view discrepancy and discriminancy simultaneously.
We propose Structured Low-rank Matrix Recovery (SLMR), a unique method of effectively removing view discrepancy and improving discriminancy.
arXiv Detail & Related papers (2020-03-22T03:57:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.