What shapes the loss landscape of self-supervised learning?
- URL: http://arxiv.org/abs/2210.00638v1
- Date: Sun, 2 Oct 2022 21:46:16 GMT
- Title: What shapes the loss landscape of self-supervised learning?
- Authors: Liu Ziyin, Ekdeep Singh Lubana, Masahito Ueda, Hidenori Tanaka
- Abstract summary: Prevention of complete and dimensional collapse of representations has recently become a design principle for self-supervised learning (SSL)
We provide answers by thoroughly analyzing SSL loss landscapes for a linear model.
We derive an analytically tractable theory of SSL landscape and show that it accurately captures an array of collapse phenomena and identifies their causes.
- Score: 10.896567381206715
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Prevention of complete and dimensional collapse of representations has
recently become a design principle for self-supervised learning (SSL). However,
questions remain in our theoretical understanding: When do those collapses
occur? What are the mechanisms and causes? We provide answers to these
questions by thoroughly analyzing SSL loss landscapes for a linear model. We
derive an analytically tractable theory of SSL landscape and show that it
accurately captures an array of collapse phenomena and identifies their causes.
Finally, we leverage the interpretability afforded by the analytical theory to
understand how dimensional collapse can be beneficial and what affects the
robustness of SSL against data imbalance.
Related papers
- On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - MoCa: Measuring Human-Language Model Alignment on Causal and Moral
Judgment Tasks [49.60689355674541]
A rich literature in cognitive science has studied people's causal and moral intuitions.
This work has revealed a number of factors that systematically influence people's judgments.
We test whether large language models (LLMs) make causal and moral judgments about text-based scenarios that align with human participants.
arXiv Detail & Related papers (2023-10-30T15:57:32Z) - An Embarrassingly Simple Backdoor Attack on Self-supervised Learning [52.28670953101126]
Self-supervised learning (SSL) is capable of learning high-quality representations of complex data without relying on labels.
We study the inherent vulnerability of SSL to backdoor attacks.
arXiv Detail & Related papers (2022-10-13T20:39:21Z) - Imbalance Trouble: Revisiting Neural-Collapse Geometry [27.21274327569783]
We introduce Simplex-Encoded-Labels Interpolation (SELI) as an invariant characterization of the neural collapse phenomenon.
We prove for the UFM with cross-entropy loss and vanishing regularization.
We present experiments on synthetic and real datasets that confirm convergence to the SELI geometry.
arXiv Detail & Related papers (2022-08-10T18:10:59Z) - How Does SimSiam Avoid Collapse Without Negative Samples? A Unified
Understanding with Self-supervised Contrastive Learning [79.94590011183446]
To avoid collapse in self-supervised learning, a contrastive loss is widely used but often requires a large number of negative samples.
A recent work has attracted significant attention for providing a minimalist simple Siamese (SimSiam) method to avoid collapse.
arXiv Detail & Related papers (2022-03-30T12:46:31Z) - Limitations of Neural Collapse for Understanding Generalization in Deep
Learning [25.48346719747956]
Recent work of Papyan, Han, & Donoho presented an intriguing "Neural Collapse" phenomenon.
Our motivation is to study the upper limits of this research program.
arXiv Detail & Related papers (2022-02-17T00:20:12Z) - Understanding self-supervised Learning Dynamics without Contrastive
Pairs [72.1743263777693]
Contrastive approaches to self-supervised learning (SSL) learn representations by minimizing the distance between two augmented views of the same data point.
BYOL and SimSiam, show remarkable performance it without negative pairs.
We study the nonlinear learning dynamics of non-contrastive SSL in simple linear networks.
arXiv Detail & Related papers (2021-02-12T22:57:28Z) - Dependency Decomposition and a Reject Option for Explainable Models [4.94950858749529]
Recent deep learning models perform extremely well in various inference tasks.
Recent advances offer methods to visualize features, describe attribution of the input.
We present the first analysis of dependencies regarding the probability distribution over the desired image classification outputs.
arXiv Detail & Related papers (2020-12-11T17:39:33Z) - Long-Tailed Classification by Keeping the Good and Removing the Bad
Momentum Causal Effect [95.37587481952487]
Long-tailed classification is the key to deep learning at scale.
Existing methods are mainly based on re-weighting/resamplings that lack a fundamental theory.
In this paper, we establish a causal inference framework, which not only unravels the whys of previous methods, but also derives a new principled solution.
arXiv Detail & Related papers (2020-09-28T00:32:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.