Related papers: What Do We Maximize in Self-Supervised Learning?

What Do We Maximize in Self-Supervised Learning?

URL: http://arxiv.org/abs/2207.10081v1
Date: Wed, 20 Jul 2022 04:44:26 GMT
Title: What Do We Maximize in Self-Supervised Learning?
Authors: Ravid Shwartz-Ziv, Randall Balestriero, Yann LeCun
Abstract summary: We show how information-theoretic quantities can be obtained for a deterministic network. We empirically demonstrate the validity of our assumptions, confirming our novel understanding of VICReg. We believe that the derivation and insights we obtain can be generalized to many other SSL methods.
Score: 17.94932034403123
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we examine self-supervised learning methods, particularly VICReg, to provide an information-theoretical understanding of their construction. As a first step, we demonstrate how information-theoretic quantities can be obtained for a deterministic network, offering a possible alternative to prior work that relies on stochastic models. This enables us to demonstrate how VICReg can be (re)discovered from first principles and its assumptions about data distribution. Furthermore, we empirically demonstrate the validity of our assumptions, confirming our novel understanding of VICReg. Finally, we believe that the derivation and insights we obtain can be generalized to many other SSL methods, opening new avenues for theoretical and practical understanding of SSL and transfer learning.

Related papers

Language Guided Concept Bottleneck Models for Interpretable Continual Learning [62.09201360376577]
Continual learning aims to enable learning systems to acquire new knowledge constantly without forgetting previously learned information. Most existing CL methods focus primarily on preserving learned knowledge to improve model performance. We introduce a novel framework that integrates language-guided Concept Bottleneck Models to address both challenges.
arXiv Detail & Related papers (2025-03-30T02:41:55Z)
Understanding the Role of Equivariance in Self-supervised Learning [51.56331245499712]
equivariant self-supervised learning (E-SSL) learns features to be augmentation-aware. We identify a critical explaining-away effect in E-SSL that creates a synergy between the equivariant and classification tasks. We reveal several principles for practical designs of E-SSL.
arXiv Detail & Related papers (2024-11-10T16:09:47Z)
Dynamics of Supervised and Reinforcement Learning in the Non-Linear Perceptron [3.069335774032178]
We use a dataset-process approach to derive flow equations describing learning. We characterize the effects of the learning rule (supervised or reinforcement learning, SL/RL) and input-data distribution on the perceptron's learning curve. This approach points a way toward analyzing learning dynamics for more-complex circuit architectures.
arXiv Detail & Related papers (2024-09-05T17:58:28Z)
More Flexible PAC-Bayesian Meta-Learning by Learning Learning Algorithms [15.621144215664769]
We introduce a new framework for studying meta-learning methods using PAC-Bayesian theory. Our main advantage is that it allows for more flexibility in how the transfer of knowledge between tasks is realized.
arXiv Detail & Related papers (2024-02-06T15:00:08Z)
A Probabilistic Model Behind Self-Supervised Learning [53.64989127914936]
In self-supervised learning (SSL), representations are learned via an auxiliary task without annotated labels. We present a generative latent variable model for self-supervised learning. We show that several families of discriminative SSL, including contrastive methods, induce a comparable distribution over representations.
arXiv Detail & Related papers (2024-02-02T13:31:17Z)
Continual Zero-Shot Learning through Semantically Guided Generative Random Walks [56.65465792750822]
We address the challenge of continual zero-shot learning where unseen information is not provided during training, by leveraging generative modeling. We propose our learning algorithm that employs a novel semantically guided Generative Random Walk (GRW) loss. Our algorithm achieves state-of-the-art performance on AWA1, AWA2, CUB, and SUN datasets, surpassing existing CZSL methods by 3-7%.
arXiv Detail & Related papers (2023-08-23T18:10:12Z)
On the Stepwise Nature of Self-Supervised Learning [0.0]
We present a simple picture of the training process of joint embedding self-supervised learning methods. We find that these methods learn their high-dimensional embeddings one dimension at a time in a sequence of discrete, well-separated steps. Our theory suggests that, just as kernel regression can be thought of as a model of supervised learning, kernel PCA may serve as a useful model of self-supervised learning.
arXiv Detail & Related papers (2023-03-27T17:59:20Z)
An Information-Theoretic Perspective on Variance-Invariance-Covariance Regularization [52.44068740462729]
We present an information-theoretic perspective on the VICReg objective. We derive a generalization bound for VICReg, revealing its inherent advantages for downstream tasks. We introduce a family of SSL methods derived from information-theoretic principles that outperform existing SSL techniques.
arXiv Detail & Related papers (2023-03-01T16:36:25Z)
Can Direct Latent Model Learning Solve Linear Quadratic Gaussian Control? [75.14973944905216]
We study the task of learning state representations from potentially high-dimensional observations. We pursue a direct latent model learning approach, where a dynamic model in some latent state space is learned by predicting quantities directly related to planning.
arXiv Detail & Related papers (2022-12-30T01:42:04Z)
Mixture-of-Variational-Experts for Continual Learning [0.0]
We propose an optimality principle that facilitates a trade-off between learning and forgetting. We propose a neural network layer for continual learning, called Mixture-of-Variational-Experts (MoVE) Our experiments on variants of the MNIST and CIFAR10 datasets demonstrate the competitive performance of MoVE layers.
arXiv Detail & Related papers (2021-10-25T06:32:06Z)
InteL-VAEs: Adding Inductive Biases to Variational Auto-Encoders via Intermediary Latents [60.785317191131284]
We introduce a simple and effective method for learning VAEs with controllable biases by using an intermediary set of latent variables. In particular, it allows us to impose desired properties like sparsity or clustering on learned representations. We show that this, in turn, allows InteL-VAEs to learn both better generative models and representations.
arXiv Detail & Related papers (2021-06-25T16:34:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.