Variance-Covariance Regularization Improves Representation Learning
- URL: http://arxiv.org/abs/2306.13292v2
- Date: Thu, 22 Feb 2024 21:07:10 GMT
- Title: Variance-Covariance Regularization Improves Representation Learning
- Authors: Jiachen Zhu, Katrina Evtimova, Yubei Chen, Ravid Shwartz-Ziv, Yann
LeCun
- Abstract summary: We adapt a self-supervised learning regularization technique to supervised learning contexts, introducing Variance-Covariance Regularization (VCReg)
We demonstrate that VCReg significantly enhances transfer learning for images and videos, achieving state-of-the-art performance across numerous tasks and datasets.
In summary, VCReg offers a universally applicable regularization framework that significantly advances transfer learning and highlights the connection between gradient starvation, neural collapse, and feature transferability.
- Score: 28.341622247252705
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Transfer learning plays a key role in advancing machine learning models, yet
conventional supervised pretraining often undermines feature transferability by
prioritizing features that minimize the pretraining loss. In this work, we
adapt a self-supervised learning regularization technique from the VICReg
method to supervised learning contexts, introducing Variance-Covariance
Regularization (VCReg). This adaptation encourages the network to learn
high-variance, low-covariance representations, promoting learning more diverse
features. We outline best practices for an efficient implementation of our
framework, including applying it to the intermediate representations. Through
extensive empirical evaluation, we demonstrate that our method significantly
enhances transfer learning for images and videos, achieving state-of-the-art
performance across numerous tasks and datasets. VCReg also improves performance
in scenarios like long-tail learning and hierarchical classification.
Additionally, we show its effectiveness may stem from its success in addressing
challenges like gradient starvation and neural collapse. In summary, VCReg
offers a universally applicable regularization framework that significantly
advances transfer learning and highlights the connection between gradient
starvation, neural collapse, and feature transferability.
Related papers
- Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Self-Regularization [77.62516752323207]
We introduce an orthogonal fine-tuning method for efficiently fine-tuning pretrained weights and enabling enhanced robustness and generalization.
A self-regularization strategy is further exploited to maintain the stability in terms of zero-shot generalization of VLMs, dubbed OrthSR.
For the first time, we revisit the CLIP and CoOp with our method to effectively improve the model on few-shot image classficiation scenario.
arXiv Detail & Related papers (2024-07-11T10:35:53Z) - Normalization and effective learning rates in reinforcement learning [52.59508428613934]
Normalization layers have recently experienced a renaissance in the deep reinforcement learning and continual learning literature.
We show that normalization brings with it a subtle but important side effect: an equivalence between growth in the norm of the network parameters and decay in the effective learning rate.
We propose to make the learning rate schedule explicit with a simple re- parameterization which we call Normalize-and-Project.
arXiv Detail & Related papers (2024-07-01T20:58:01Z) - Contrastive-Adversarial and Diffusion: Exploring pre-training and fine-tuning strategies for sulcal identification [3.0398616939692777]
Techniques like adversarial learning, contrastive learning, diffusion denoising learning, and ordinary reconstruction learning have become standard.
The study aims to elucidate the advantages of pre-training techniques and fine-tuning strategies to enhance the learning process of neural networks.
arXiv Detail & Related papers (2024-05-29T15:44:51Z) - Continual Learners are Incremental Model Generalizers [70.34479702177988]
This paper extensively studies the impact of Continual Learning (CL) models as pre-trainers.
We find that the transfer quality of the representation often increases gradually without noticeable degradation in fine-tuning performance.
We propose a new fine-tuning scheme, GLobal Attention Discretization (GLAD), that preserves rich task-generic representation during solving downstream tasks.
arXiv Detail & Related papers (2023-06-21T05:26:28Z) - ArCL: Enhancing Contrastive Learning with Augmentation-Robust
Representations [30.745749133759304]
We develop a theoretical framework to analyze the transferability of self-supervised contrastive learning.
We show that contrastive learning fails to learn domain-invariant features, which limits its transferability.
Based on these theoretical insights, we propose a novel method called Augmentation-robust Contrastive Learning (ArCL)
arXiv Detail & Related papers (2023-03-02T09:26:20Z) - Learning State Representations via Retracing in Reinforcement Learning [25.755855290244103]
Learning via retracing is a self-supervised approach for learning the state representation for reinforcement learning tasks.
We introduce Cycle-Consistency World Model (CCWM), a concrete instantiation of learning via retracing.
We show that CCWM achieves state-of-the-art performance in terms of sample efficiency and performance.
arXiv Detail & Related papers (2021-11-24T16:19:59Z) - Unsupervised Transfer Learning for Spatiotemporal Predictive Networks [90.67309545798224]
We study how to transfer knowledge from a zoo of unsupervisedly learned models towards another network.
Our motivation is that models are expected to understand complex dynamics from different sources.
Our approach yields significant improvements on three benchmarks fortemporal prediction, and benefits the target even from less relevant ones.
arXiv Detail & Related papers (2020-09-24T15:40:55Z) - Uniform Priors for Data-Efficient Transfer [65.086680950871]
We show that features that are most transferable have high uniformity in the embedding space.
We evaluate the regularization on its ability to facilitate adaptation to unseen tasks and data.
arXiv Detail & Related papers (2020-06-30T04:39:36Z) - Guided Variational Autoencoder for Disentanglement Learning [79.02010588207416]
We propose an algorithm, guided variational autoencoder (Guided-VAE), that is able to learn a controllable generative model by performing latent representation disentanglement learning.
We design an unsupervised strategy and a supervised strategy in Guided-VAE and observe enhanced modeling and controlling capability over the vanilla VAE.
arXiv Detail & Related papers (2020-04-02T20:49:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.