Implicit variance regularization in non-contrastive SSL
- URL: http://arxiv.org/abs/2212.04858v2
- Date: Fri, 27 Oct 2023 12:58:06 GMT
- Title: Implicit variance regularization in non-contrastive SSL
- Authors: Manu Srinath Halvagal, Axel Laborieux, Friedemann Zenke
- Abstract summary: We analytically study learning dynamics in conjunction with Euclidean and cosine similarity in the eigenspace of closed-form linear predictor networks.
We propose a family of isotropic loss functions (IsoLoss) that equalize convergence rates across eigenmodes.
- Score: 7.573586022424398
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Non-contrastive SSL methods like BYOL and SimSiam rely on asymmetric
predictor networks to avoid representational collapse without negative samples.
Yet, how predictor networks facilitate stable learning is not fully understood.
While previous theoretical analyses assumed Euclidean losses, most practical
implementations rely on cosine similarity. To gain further theoretical insight
into non-contrastive SSL, we analytically study learning dynamics in
conjunction with Euclidean and cosine similarity in the eigenspace of
closed-form linear predictor networks. We show that both avoid collapse through
implicit variance regularization albeit through different dynamical mechanisms.
Moreover, we find that the eigenvalues act as effective learning rate
multipliers and propose a family of isotropic loss functions (IsoLoss) that
equalize convergence rates across eigenmodes. Empirically, IsoLoss speeds up
the initial learning dynamics and increases robustness, thereby allowing us to
dispense with the EMA target network typically used with non-contrastive
methods. Our analysis sheds light on the variance regularization mechanisms of
non-contrastive SSL and lays the theoretical grounds for crafting novel loss
functions that shape the learning dynamics of the predictor's spectrum.
Related papers
- On the Surprising Effectiveness of Large Learning Rates under Standard Width Scaling [11.168336416219857]
Existing infinite-width theory would predict instability under large learning rates and vanishing feature learning under stable learning rates.<n>We show that this discrepancy is not fully explained by finite-width phenomena such as catapult effects.<n>We validate that neural networks operate in this controlled divergence regime under CE loss but not under MSE loss.
arXiv Detail & Related papers (2025-05-28T15:40:48Z) - Learning Broken Symmetries with Approximate Invariance [1.0485739694839669]
In many cases, the exact underlying symmetry is present only in an idealized dataset, and is broken in actual data.
Standard approaches, such as data augmentation or equivariant networks fail to represent the nature of the full, broken symmetry.
We propose a learning model which balances the generality and performance of unconstrained networks with the rapid learning of constrained networks.
arXiv Detail & Related papers (2024-12-25T04:29:04Z) - Preventing Collapse in Contrastive Learning with Orthonormal Prototypes (CLOP) [0.0]
CLOP is a novel semi-supervised loss function designed to prevent neural collapse by promoting the formation of linear subspaces among class embeddings.
We show that CLOP enhances performance, providing greater stability across different learning rates and batch sizes.
arXiv Detail & Related papers (2024-03-27T15:48:16Z) - On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - Selective Nonparametric Regression via Testing [54.20569354303575]
We develop an abstention procedure via testing the hypothesis on the value of the conditional variance at a given point.
Unlike existing methods, the proposed one allows to account not only for the value of the variance itself but also for the uncertainty of the corresponding variance predictor.
arXiv Detail & Related papers (2023-09-28T13:04:11Z) - Regularizing with Pseudo-Negatives for Continual Self-Supervised Learning [62.40718385934608]
We introduce a novel Pseudo-Negative Regularization (PNR) framework for effective continual self-supervised learning (CSSL)
Our PNR leverages pseudo-negatives obtained through model-based augmentation in a way that newly learned representations may not contradict what has been learned in the past.
arXiv Detail & Related papers (2023-06-08T10:59:35Z) - Stochastic Modified Equations and Dynamics of Dropout Algorithm [4.811269936680572]
Dropout is a widely utilized regularization technique in the training of neural networks.
Its underlying mechanism and its impact on achieving good abilities remain poorly understood.
arXiv Detail & Related papers (2023-05-25T08:42:25Z) - Non-Parametric Learning of Stochastic Differential Equations with Non-asymptotic Fast Rates of Convergence [65.63201894457404]
We propose a novel non-parametric learning paradigm for the identification of drift and diffusion coefficients of non-linear differential equations.
The key idea essentially consists of fitting a RKHS-based approximation of the corresponding Fokker-Planck equation to such observations.
arXiv Detail & Related papers (2023-05-24T20:43:47Z) - Leveraging Heteroscedastic Uncertainty in Learning Complex Spectral
Mapping for Single-channel Speech Enhancement [20.823177372464414]
Most speech enhancement (SE) models learn a point estimate, and do not make use of uncertainty estimation in the learning process.
We show that modeling heteroscedastic uncertainty by minimizing a multivariate Gaussian negative log-likelihood (NLL) improves SE performance at no extra cost.
arXiv Detail & Related papers (2022-11-16T02:29:05Z) - On the generalization of learning algorithms that do not converge [54.122745736433856]
Generalization analyses of deep learning typically assume that the training converges to a fixed point.
Recent results indicate that in practice, the weights of deep neural networks optimized with gradient descent often oscillate indefinitely.
arXiv Detail & Related papers (2022-08-16T21:22:34Z) - Error Bounds of the Invariant Statistics in Machine Learning of Ergodic
It\^o Diffusions [8.627408356707525]
We study the theoretical underpinnings of machine learning of ergodic Ito diffusions.
We deduce a linear dependence of the errors of one-point and two-point invariant statistics on the error in the learning of the drift and diffusion coefficients.
arXiv Detail & Related papers (2021-05-21T02:55:59Z) - Understanding self-supervised Learning Dynamics without Contrastive
Pairs [72.1743263777693]
Contrastive approaches to self-supervised learning (SSL) learn representations by minimizing the distance between two augmented views of the same data point.
BYOL and SimSiam, show remarkable performance it without negative pairs.
We study the nonlinear learning dynamics of non-contrastive SSL in simple linear networks.
arXiv Detail & Related papers (2021-02-12T22:57:28Z) - Semi-Supervised Empirical Risk Minimization: Using unlabeled data to
improve prediction [4.860671253873579]
We present a general methodology for using unlabeled data to design semi supervised learning (SSL) variants of the Empirical Risk Minimization (ERM) learning process.
We analyze of the effectiveness of our SSL approach in improving prediction performance.
arXiv Detail & Related papers (2020-09-01T17:55:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.