What Do We Maximize in Self-Supervised Learning?
- URL: http://arxiv.org/abs/2207.10081v1
- Date: Wed, 20 Jul 2022 04:44:26 GMT
- Title: What Do We Maximize in Self-Supervised Learning?
- Authors: Ravid Shwartz-Ziv, Randall Balestriero, Yann LeCun
- Abstract summary: We show how information-theoretic quantities can be obtained for a deterministic network.
We empirically demonstrate the validity of our assumptions, confirming our novel understanding of VICReg.
We believe that the derivation and insights we obtain can be generalized to many other SSL methods.
- Score: 17.94932034403123
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we examine self-supervised learning methods, particularly
VICReg, to provide an information-theoretical understanding of their
construction. As a first step, we demonstrate how information-theoretic
quantities can be obtained for a deterministic network, offering a possible
alternative to prior work that relies on stochastic models. This enables us to
demonstrate how VICReg can be (re)discovered from first principles and its
assumptions about data distribution. Furthermore, we empirically demonstrate
the validity of our assumptions, confirming our novel understanding of VICReg.
Finally, we believe that the derivation and insights we obtain can be
generalized to many other SSL methods, opening new avenues for theoretical and
practical understanding of SSL and transfer learning.
Related papers
- Dynamics of Supervised and Reinforcement Learning in the Non-Linear Perceptron [3.069335774032178]
We use a dataset-process approach to derive flow equations describing learning.
We characterize the effects of the learning rule (supervised or reinforcement learning, SL/RL) and input-data distribution on the perceptron's learning curve.
This approach points a way toward analyzing learning dynamics for more-complex circuit architectures.
arXiv Detail & Related papers (2024-09-05T17:58:28Z) - More Flexible PAC-Bayesian Meta-Learning by Learning Learning Algorithms [15.621144215664769]
We introduce a new framework for studying meta-learning methods using PAC-Bayesian theory.
Our main advantage is that it allows for more flexibility in how the transfer of knowledge between tasks is realized.
arXiv Detail & Related papers (2024-02-06T15:00:08Z) - A Probabilistic Model Behind Self-Supervised Learning [53.64989127914936]
In self-supervised learning (SSL), representations are learned via an auxiliary task without annotated labels.
We present a generative latent variable model for self-supervised learning.
We show that several families of discriminative SSL, including contrastive methods, induce a comparable distribution over representations.
arXiv Detail & Related papers (2024-02-02T13:31:17Z) - Continual Zero-Shot Learning through Semantically Guided Generative
Random Walks [56.65465792750822]
We address the challenge of continual zero-shot learning where unseen information is not provided during training, by leveraging generative modeling.
We propose our learning algorithm that employs a novel semantically guided Generative Random Walk (GRW) loss.
Our algorithm achieves state-of-the-art performance on AWA1, AWA2, CUB, and SUN datasets, surpassing existing CZSL methods by 3-7%.
arXiv Detail & Related papers (2023-08-23T18:10:12Z) - On the Stepwise Nature of Self-Supervised Learning [0.0]
We present a simple picture of the training process of joint embedding self-supervised learning methods.
We find that these methods learn their high-dimensional embeddings one dimension at a time in a sequence of discrete, well-separated steps.
Our theory suggests that, just as kernel regression can be thought of as a model of supervised learning, kernel PCA may serve as a useful model of self-supervised learning.
arXiv Detail & Related papers (2023-03-27T17:59:20Z) - An Information-Theoretic Perspective on Variance-Invariance-Covariance Regularization [52.44068740462729]
We present an information-theoretic perspective on the VICReg objective.
We derive a generalization bound for VICReg, revealing its inherent advantages for downstream tasks.
We introduce a family of SSL methods derived from information-theoretic principles that outperform existing SSL techniques.
arXiv Detail & Related papers (2023-03-01T16:36:25Z) - Can Direct Latent Model Learning Solve Linear Quadratic Gaussian
Control? [75.14973944905216]
We study the task of learning state representations from potentially high-dimensional observations.
We pursue a direct latent model learning approach, where a dynamic model in some latent state space is learned by predicting quantities directly related to planning.
arXiv Detail & Related papers (2022-12-30T01:42:04Z) - Mixture-of-Variational-Experts for Continual Learning [0.0]
We propose an optimality principle that facilitates a trade-off between learning and forgetting.
We propose a neural network layer for continual learning, called Mixture-of-Variational-Experts (MoVE)
Our experiments on variants of the MNIST and CIFAR10 datasets demonstrate the competitive performance of MoVE layers.
arXiv Detail & Related papers (2021-10-25T06:32:06Z) - InteL-VAEs: Adding Inductive Biases to Variational Auto-Encoders via
Intermediary Latents [60.785317191131284]
We introduce a simple and effective method for learning VAEs with controllable biases by using an intermediary set of latent variables.
In particular, it allows us to impose desired properties like sparsity or clustering on learned representations.
We show that this, in turn, allows InteL-VAEs to learn both better generative models and representations.
arXiv Detail & Related papers (2021-06-25T16:34:05Z) - Prototypical Contrastive Learning of Unsupervised Representations [171.3046900127166]
Prototypical Contrastive Learning (PCL) is an unsupervised representation learning method.
PCL implicitly encodes semantic structures of the data into the learned embedding space.
PCL outperforms state-of-the-art instance-wise contrastive learning methods on multiple benchmarks.
arXiv Detail & Related papers (2020-05-11T09:53:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.