Contrastive and Non-Contrastive Self-Supervised Learning Recover Global
and Local Spectral Embedding Methods
- URL: http://arxiv.org/abs/2205.11508v2
- Date: Thu, 26 May 2022 17:54:18 GMT
- Title: Contrastive and Non-Contrastive Self-Supervised Learning Recover Global
and Local Spectral Embedding Methods
- Authors: Randall Balestriero, Yann LeCun
- Abstract summary: Self-Supervised Learning (SSL) surmises that inputs and pairwise positive relationships are enough to learn meaningful representations.
This paper proposes a unifying framework under the helm of spectral manifold learning to address those limitations.
- Score: 19.587273175563745
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Self-Supervised Learning (SSL) surmises that inputs and pairwise positive
relationships are enough to learn meaningful representations. Although SSL has
recently reached a milestone: outperforming supervised methods in many
modalities\dots the theoretical foundations are limited, method-specific, and
fail to provide principled design guidelines to practitioners. In this paper,
we propose a unifying framework under the helm of spectral manifold learning to
address those limitations. Through the course of this study, we will rigorously
demonstrate that VICReg, SimCLR, BarlowTwins et al. correspond to eponymous
spectral methods such as Laplacian Eigenmaps, Multidimensional Scaling et al.
This unification will then allow us to obtain (i) the closed-form optimal
representation for each method, (ii) the closed-form optimal network parameters
in the linear regime for each method, (iii) the impact of the pairwise
relations used during training on each of those quantities and on downstream
task performances, and most importantly, (iv) the first theoretical bridge
between contrastive and non-contrastive methods towards global and local
spectral embedding methods respectively, hinting at the benefits and
limitations of each. For example, (i) if the pairwise relation is aligned with
the downstream task, any SSL method can be employed successfully and will
recover the supervised method, but in the low data regime, VICReg's invariance
hyper-parameter should be high; (ii) if the pairwise relation is misaligned
with the downstream task, VICReg with small invariance hyper-parameter should
be preferred over SimCLR or BarlowTwins.
Related papers
- On The Global Convergence Of Online RLHF With Neural Parametrization [36.239015146313136]
Reinforcement Learning from Human Feedback (RLHF) aims to align large language models with human values.
RLHF is a three-stage process that includes supervised fine-tuning, reward learning, and policy learning.
We propose a bi-level formulation for AI alignment in parameterized settings and introduce a first-order approach to solve this problem.
arXiv Detail & Related papers (2024-10-21T03:13:35Z) - Synergistic eigenanalysis of covariance and Hessian matrices for enhanced binary classification [72.77513633290056]
We present a novel approach that combines the eigenanalysis of a covariance matrix evaluated on a training set with a Hessian matrix evaluated on a deep learning model.
Our method captures intricate patterns and relationships, enhancing classification performance.
arXiv Detail & Related papers (2024-02-14T16:10:42Z) - SOFARI: High-Dimensional Manifold-Based Inference [8.860162863559163]
We introduce two SOFARI variants to handle strongly and weakly latent factors, where the latter covers a broader range of applications.
We show that SOFARI provides bias-corrected estimators for both latent left factor vectors and singular values, for which we show to enjoy the mean-zero normal distributions with sparse estimable variances.
We illustrate the effectiveness of SOFARI and justify our theoretical results through simulation examples and a real data application in economic forecasting.
arXiv Detail & Related papers (2023-09-26T16:01:54Z) - An Information-Theoretic Perspective on Variance-Invariance-Covariance Regularization [52.44068740462729]
We present an information-theoretic perspective on the VICReg objective.
We derive a generalization bound for VICReg, revealing its inherent advantages for downstream tasks.
We introduce a family of SSL methods derived from information-theoretic principles that outperform existing SSL techniques.
arXiv Detail & Related papers (2023-03-01T16:36:25Z) - Supervised Contrastive Learning as Multi-Objective Optimization for
Fine-Tuning Large Pre-trained Language Models [3.759936323189417]
Supervised Contrastive Learning (SCL) has been shown to achieve excellent performance in most classification tasks.
In this work, we formulate the SCL problem as a Multi-Objective Optimization problem for the fine-tuning phase of RoBERTa language model.
arXiv Detail & Related papers (2022-09-28T15:13:58Z) - Low-rank Optimal Transport: Approximation, Statistics and Debiasing [51.50788603386766]
Low-rank optimal transport (LOT) approach advocated in citescetbon 2021lowrank
LOT is seen as a legitimate contender to entropic regularization when compared on properties of interest.
We target each of these areas in this paper in order to cement the impact of low-rank approaches in computational OT.
arXiv Detail & Related papers (2022-05-24T20:51:37Z) - Semi-supervised Domain Adaptive Structure Learning [72.01544419893628]
Semi-supervised domain adaptation (SSDA) is a challenging problem requiring methods to overcome both 1) overfitting towards poorly annotated data and 2) distribution shift across domains.
We introduce an adaptive structure learning method to regularize the cooperation of SSL and DA.
arXiv Detail & Related papers (2021-12-12T06:11:16Z) - Deep learning: a statistical viewpoint [120.94133818355645]
Deep learning has revealed some major surprises from a theoretical perspective.
In particular, simple gradient methods easily find near-perfect solutions to non-optimal training problems.
We conjecture that specific principles underlie these phenomena.
arXiv Detail & Related papers (2021-03-16T16:26:36Z) - On Data-Augmentation and Consistency-Based Semi-Supervised Learning [77.57285768500225]
Recently proposed consistency-based Semi-Supervised Learning (SSL) methods have advanced the state of the art in several SSL tasks.
Despite these advances, the understanding of these methods is still relatively limited.
arXiv Detail & Related papers (2021-01-18T10:12:31Z) - Sparse Methods for Automatic Relevance Determination [0.0]
We first review automatic relevance determination (ARD) and analytically demonstrate the need to additional regularization or thresholding to achieve sparse models.
We then discuss two classes of methods, regularization based and thresholding based, which build on ARD to learn parsimonious solutions to linear problems.
arXiv Detail & Related papers (2020-05-18T14:08:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.