Representation Learning Dynamics of Self-Supervised Models
- URL: http://arxiv.org/abs/2309.02011v1
- Date: Tue, 5 Sep 2023 07:48:45 GMT
- Title: Representation Learning Dynamics of Self-Supervised Models
- Authors: Pascal Esser, Satyaki Mukherjee, Debarghya Ghoshdastidar
- Abstract summary: Self-Supervised Learning (SSL) is an important paradigm for learning representations from unlabelled data.
We study the learning dynamics of SSL models, specifically representations obtained by minimising contrastive and non-contrastive losses.
We derive the exact learning dynamics of the SSL models trained using gradient descent on the Grassmannian manifold.
- Score: 7.289672463326423
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-Supervised Learning (SSL) is an important paradigm for learning
representations from unlabelled data, and SSL with neural networks has been
highly successful in practice. However current theoretical analysis of SSL is
mostly restricted to generalisation error bounds. In contrast, learning
dynamics often provide a precise characterisation of the behaviour of neural
networks based models but, so far, are mainly known in supervised settings. In
this paper, we study the learning dynamics of SSL models, specifically
representations obtained by minimising contrastive and non-contrastive losses.
We show that a naive extension of the dymanics of multivariate regression to
SSL leads to learning trivial scalar representations that demonstrates
dimension collapse in SSL. Consequently, we formulate SSL objectives with
orthogonality constraints on the weights, and derive the exact (network width
independent) learning dynamics of the SSL models trained using gradient descent
on the Grassmannian manifold. We also argue that the infinite width
approximation of SSL models significantly deviate from the neural tangent
kernel approximations of supervised models. We numerically illustrate the
validity of our theoretical findings, and discuss how the presented results
provide a framework for further theoretical analysis of contrastive and
non-contrastive SSL.
Related papers
- On the Discriminability of Self-Supervised Representation Learning [38.598160031349686]
Self-supervised learning (SSL) has recently achieved significant success in downstream visual tasks.
A notable gap still exists between SSL and supervised learning (SL), especially in complex downstream tasks.
arXiv Detail & Related papers (2024-07-18T14:18:03Z) - Towards Scalable and Versatile Weight Space Learning [51.78426981947659]
This paper introduces the SANE approach to weight-space learning.
Our method extends the idea of hyper-representations towards sequential processing of subsets of neural network weights.
arXiv Detail & Related papers (2024-06-14T13:12:07Z) - Erasing the Bias: Fine-Tuning Foundation Models for Semi-Supervised Learning [4.137391543972184]
Semi-supervised learning (SSL) has witnessed remarkable progress, resulting in numerous method variations.
In this paper, we present a novel SSL approach named FineSSL that significantly addresses this limitation by adapting pre-trained foundation models.
We demonstrate that FineSSL sets a new state of the art for SSL on multiple benchmark datasets, reduces the training cost by over six times, and can seamlessly integrate various fine-tuning and modern SSL algorithms.
arXiv Detail & Related papers (2024-05-20T03:33:12Z) - Understanding Representation Learnability of Nonlinear Self-Supervised
Learning [13.965135660149212]
Self-supervised learning (SSL) has empirically shown its data representation learnability in many downstream tasks.
Our paper is the first to analyze the learning results of the nonlinear SSL model accurately.
arXiv Detail & Related papers (2024-01-06T13:23:26Z) - Explaining, Analyzing, and Probing Representations of Self-Supervised
Learning Models for Sensor-based Human Activity Recognition [2.2082422928825136]
Self-supervised learning (SSL) frameworks have been extensively applied to sensor-based Human Activity Recognition (HAR)
In this paper, we aim to analyze deep representations of two recent SSL frameworks, namely SimCLR and VICReg.
arXiv Detail & Related papers (2023-04-14T07:53:59Z) - LSFSL: Leveraging Shape Information in Few-shot Learning [11.145085584637746]
Few-shot learning techniques seek to learn the underlying patterns in data using fewer samples, analogous to how humans learn from limited experience.
In this limited-data scenario, the challenges associated with deep neural networks, such as shortcut learning and texture bias behaviors, are further exacerbated.
We propose LSFSL, which enforces the model to learn more generalizable features utilizing the implicit prior information present in the data.
arXiv Detail & Related papers (2023-04-13T16:59:22Z) - The Geometry of Self-supervised Learning Models and its Impact on
Transfer Learning [62.601681746034956]
Self-supervised learning (SSL) has emerged as a desirable paradigm in computer vision.
We propose a data-driven geometric strategy to analyze different SSL models using local neighborhoods in the feature space induced by each.
arXiv Detail & Related papers (2022-09-18T18:15:38Z) - Self-supervised Learning is More Robust to Dataset Imbalance [65.84339596595383]
We investigate self-supervised learning under dataset imbalance.
Off-the-shelf self-supervised representations are already more robust to class imbalance than supervised representations.
We devise a re-weighted regularization technique that consistently improves the SSL representation quality on imbalanced datasets.
arXiv Detail & Related papers (2021-10-11T06:29:56Z) - On Data-Augmentation and Consistency-Based Semi-Supervised Learning [77.57285768500225]
Recently proposed consistency-based Semi-Supervised Learning (SSL) methods have advanced the state of the art in several SSL tasks.
Despite these advances, the understanding of these methods is still relatively limited.
arXiv Detail & Related papers (2021-01-18T10:12:31Z) - Understanding Self-supervised Learning with Dual Deep Networks [74.92916579635336]
We propose a novel framework to understand contrastive self-supervised learning (SSL) methods that employ dual pairs of deep ReLU networks.
We prove that in each SGD update of SimCLR with various loss functions, the weights at each layer are updated by a emphcovariance operator.
To further study what role the covariance operator plays and which features are learned in such a process, we model data generation and augmentation processes through a emphhierarchical latent tree model (HLTM)
arXiv Detail & Related papers (2020-10-01T17:51:49Z) - Neural networks with late-phase weights [66.72777753269658]
We show that the solutions found by SGD can be further improved by ensembling a subset of the weights in late stages of learning.
At the end of learning, we obtain back a single model by taking a spatial average in weight space.
arXiv Detail & Related papers (2020-07-25T13:23:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.