Related papers: Representation Learning Dynamics of Self-Supervised Models

Representation Learning Dynamics of Self-Supervised Models

URL: http://arxiv.org/abs/2309.02011v1
Date: Tue, 5 Sep 2023 07:48:45 GMT
Title: Representation Learning Dynamics of Self-Supervised Models
Authors: Pascal Esser, Satyaki Mukherjee, Debarghya Ghoshdastidar
Abstract summary: Self-Supervised Learning (SSL) is an important paradigm for learning representations from unlabelled data. We study the learning dynamics of SSL models, specifically representations obtained by minimising contrastive and non-contrastive losses. We derive the exact learning dynamics of the SSL models trained using gradient descent on the Grassmannian manifold.
Score: 7.289672463326423
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Self-Supervised Learning (SSL) is an important paradigm for learning representations from unlabelled data, and SSL with neural networks has been highly successful in practice. However current theoretical analysis of SSL is mostly restricted to generalisation error bounds. In contrast, learning dynamics often provide a precise characterisation of the behaviour of neural networks based models but, so far, are mainly known in supervised settings. In this paper, we study the learning dynamics of SSL models, specifically representations obtained by minimising contrastive and non-contrastive losses. We show that a naive extension of the dymanics of multivariate regression to SSL leads to learning trivial scalar representations that demonstrates dimension collapse in SSL. Consequently, we formulate SSL objectives with orthogonality constraints on the weights, and derive the exact (network width independent) learning dynamics of the SSL models trained using gradient descent on the Grassmannian manifold. We also argue that the infinite width approximation of SSL models significantly deviate from the neural tangent kernel approximations of supervised models. We numerically illustrate the validity of our theoretical findings, and discuss how the presented results provide a framework for further theoretical analysis of contrastive and non-contrastive SSL.

Related papers

Where Did Your Model Learn That? Label-free Influence for Self-supervised Learning [0.48933451909251774]
Self-supervised learning has revolutionized learning from large-scale unlabeled datasets. Introductory relationship between pretraining data and learned representations remains poorly understood. We introduce Influence-SSL, a novel and label-free approach for defining influence functions tailored to SSL.
arXiv Detail & Related papers (2024-12-22T21:43:56Z)
On the Discriminability of Self-Supervised Representation Learning [38.598160031349686]
Self-supervised learning (SSL) has recently achieved significant success in downstream visual tasks. A notable gap still exists between SSL and supervised learning (SL), especially in complex downstream tasks.
arXiv Detail & Related papers (2024-07-18T14:18:03Z)
Towards Scalable and Versatile Weight Space Learning [51.78426981947659]
This paper introduces the SANE approach to weight-space learning. Our method extends the idea of hyper-representations towards sequential processing of subsets of neural network weights.
arXiv Detail & Related papers (2024-06-14T13:12:07Z)
Erasing the Bias: Fine-Tuning Foundation Models for Semi-Supervised Learning [4.137391543972184]
Semi-supervised learning (SSL) has witnessed remarkable progress, resulting in numerous method variations. In this paper, we present a novel SSL approach named FineSSL that significantly addresses this limitation by adapting pre-trained foundation models. We demonstrate that FineSSL sets a new state of the art for SSL on multiple benchmark datasets, reduces the training cost by over six times, and can seamlessly integrate various fine-tuning and modern SSL algorithms.
arXiv Detail & Related papers (2024-05-20T03:33:12Z)
Understanding Representation Learnability of Nonlinear Self-Supervised Learning [13.965135660149212]
Self-supervised learning (SSL) has empirically shown its data representation learnability in many downstream tasks. Our paper is the first to analyze the learning results of the nonlinear SSL model accurately.
arXiv Detail & Related papers (2024-01-06T13:23:26Z)
Explaining, Analyzing, and Probing Representations of Self-Supervised Learning Models for Sensor-based Human Activity Recognition [2.2082422928825136]
Self-supervised learning (SSL) frameworks have been extensively applied to sensor-based Human Activity Recognition (HAR) In this paper, we aim to analyze deep representations of two recent SSL frameworks, namely SimCLR and VICReg.
arXiv Detail & Related papers (2023-04-14T07:53:59Z)
LSFSL: Leveraging Shape Information in Few-shot Learning [11.145085584637746]
Few-shot learning techniques seek to learn the underlying patterns in data using fewer samples, analogous to how humans learn from limited experience. In this limited-data scenario, the challenges associated with deep neural networks, such as shortcut learning and texture bias behaviors, are further exacerbated. We propose LSFSL, which enforces the model to learn more generalizable features utilizing the implicit prior information present in the data.
arXiv Detail & Related papers (2023-04-13T16:59:22Z)
The Geometry of Self-supervised Learning Models and its Impact on Transfer Learning [62.601681746034956]
Self-supervised learning (SSL) has emerged as a desirable paradigm in computer vision. We propose a data-driven geometric strategy to analyze different SSL models using local neighborhoods in the feature space induced by each.
arXiv Detail & Related papers (2022-09-18T18:15:38Z)
Self-supervised Learning is More Robust to Dataset Imbalance [65.84339596595383]
We investigate self-supervised learning under dataset imbalance. Off-the-shelf self-supervised representations are already more robust to class imbalance than supervised representations. We devise a re-weighted regularization technique that consistently improves the SSL representation quality on imbalanced datasets.
arXiv Detail & Related papers (2021-10-11T06:29:56Z)
On Data-Augmentation and Consistency-Based Semi-Supervised Learning [77.57285768500225]
Recently proposed consistency-based Semi-Supervised Learning (SSL) methods have advanced the state of the art in several SSL tasks. Despite these advances, the understanding of these methods is still relatively limited.
arXiv Detail & Related papers (2021-01-18T10:12:31Z)
Understanding Self-supervised Learning with Dual Deep Networks [74.92916579635336]
We propose a novel framework to understand contrastive self-supervised learning (SSL) methods that employ dual pairs of deep ReLU networks. We prove that in each SGD update of SimCLR with various loss functions, the weights at each layer are updated by a emphcovariance operator. To further study what role the covariance operator plays and which features are learned in such a process, we model data generation and augmentation processes through a emphhierarchical latent tree model (HLTM)
arXiv Detail & Related papers (2020-10-01T17:51:49Z)
Neural networks with late-phase weights [66.72777753269658]
We show that the solutions found by SGD can be further improved by ensembling a subset of the weights in late stages of learning. At the end of learning, we obtain back a single model by taking a spatial average in weight space.
arXiv Detail & Related papers (2020-07-25T13:23:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.