WERank: Towards Rank Degradation Prevention for Self-Supervised Learning
Using Weight Regularization
- URL: http://arxiv.org/abs/2402.09586v1
- Date: Wed, 14 Feb 2024 21:29:28 GMT
- Title: WERank: Towards Rank Degradation Prevention for Self-Supervised Learning
Using Weight Regularization
- Authors: Ali Saheb Pasand, Reza Moravej, Mahdi Biparva, Ali Ghodsi
- Abstract summary: We propose WERank, a new regularizer on the weight parameters of the network to prevent rank degeneration at different layers of the network.
We empirically demonstrate that WERank is effective in helping BYOL to achieve higher rank during SSL pre-training and consequently downstream accuracy during evaluation probing.
- Score: 5.484161990886851
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A common phenomena confining the representation quality in Self-Supervised
Learning (SSL) is dimensional collapse (also known as rank degeneration), where
the learned representations are mapped to a low dimensional subspace of the
representation space. The State-of-the-Art SSL methods have shown to suffer
from dimensional collapse and fall behind maintaining full rank. Recent
approaches to prevent this problem have proposed using contrastive losses,
regularization techniques, or architectural tricks. We propose WERank, a new
regularizer on the weight parameters of the network to prevent rank
degeneration at different layers of the network. We provide empirical evidence
and mathematical justification to demonstrate the effectiveness of the proposed
regularization method in preventing dimensional collapse. We verify the impact
of WERank on graph SSL where dimensional collapse is more pronounced due to the
lack of proper data augmentation. We empirically demonstrate that WERank is
effective in helping BYOL to achieve higher rank during SSL pre-training and
consequently downstream accuracy during evaluation probing. Ablation studies
and experimental analysis shed lights on the underlying factors behind the
performance gains of the proposed approach.
Related papers
- Preventing Dimensional Collapse in Self-Supervised Learning via Orthogonality Regularization [9.823816643319448]
Self-supervised learning (SSL) has rapidly advanced in recent years, approaching the performance of its supervised counterparts.
dimensional collapse, where a few large eigenvalues dominate the eigenspace, poses a significant obstacle for SSL.
arXiv Detail & Related papers (2024-11-01T06:39:18Z) - Anti-Collapse Loss for Deep Metric Learning Based on Coding Rate Metric [99.19559537966538]
DML aims to learn a discriminative high-dimensional embedding space for downstream tasks like classification, clustering, and retrieval.
To maintain the structure of embedding space and avoid feature collapse, we propose a novel loss function called Anti-Collapse Loss.
Comprehensive experiments on benchmark datasets demonstrate that our proposed method outperforms existing state-of-the-art methods.
arXiv Detail & Related papers (2024-07-03T13:44:20Z) - Gradient-based Class Weighting for Unsupervised Domain Adaptation in Dense Prediction Visual Tasks [3.776249047528669]
This paper proposes a class-imbalance mitigation strategy that incorporates class-weights into the UDA learning losses.
The novelty of estimating these weights dynamically through the loss gradient defines a Gradient-based class weighting (GBW) learning.
GBW naturally increases the contribution of classes whose learning is hindered by large-represented classes.
arXiv Detail & Related papers (2024-07-01T14:34:25Z) - Adaptive Adversarial Cross-Entropy Loss for Sharpness-Aware Minimization [2.8775022881551666]
Sharpness-Aware Minimization (SAM) was proposed to enhance model generalization.
SAM consists of two main steps, the weight perturbation step and the weight updating step.
We propose the Adaptive Adversarial Cross-Entropy (AACE) loss function to replace standard cross-entropy loss for SAM's perturbation.
arXiv Detail & Related papers (2024-06-20T14:00:01Z) - Visual Prompt Tuning in Null Space for Continual Learning [51.96411454304625]
Existing prompt-tuning methods have demonstrated impressive performances in continual learning (CL)
This paper aims to learn each task by tuning the prompts in the direction orthogonal to the subspace spanned by previous tasks' features.
In practice, an effective null-space-based approximation solution has been proposed to implement the prompt gradient projection.
arXiv Detail & Related papers (2024-06-09T05:57:40Z) - Combating Representation Learning Disparity with Geometric Harmonization [50.29859682439571]
We propose a novel Geometric Harmonization (GH) method to encourage category-level uniformity in representation learning.
Our proposal does not alter the setting of SSL and can be easily integrated into existing methods in a low-cost manner.
arXiv Detail & Related papers (2023-10-26T17:41:11Z) - Deep Metric Learning with Soft Orthogonal Proxies [1.823505080809275]
We propose a novel approach that introduces Soft Orthogonality (SO) constraint on proxies.
Our approach leverages Data-Efficient Image Transformer (DeiT) as an encoder to extract contextual features from images along with a DML objective.
Our evaluations demonstrate the superiority of our proposed approach over state-of-the-art methods by a significant margin.
arXiv Detail & Related papers (2023-06-22T17:22:15Z) - On the Importance of Feature Decorrelation for Unsupervised
Representation Learning in Reinforcement Learning [23.876039876806182]
unsupervised representation learning (URL) has improved the sample efficiency of Reinforcement Learning (RL)
We propose a novel URL framework that causally predicts future states while increasing the dimension of the latent manifold.
Our framework effectively learns predictive representations without collapse, which significantly improves the sample efficiency of state-of-the-art URL methods on the Atari 100k benchmark.
arXiv Detail & Related papers (2023-06-09T02:47:21Z) - Magnitude Matters: Fixing SIGNSGD Through Magnitude-Aware Sparsification
in the Presence of Data Heterogeneity [60.791736094073]
Communication overhead has become one of the major bottlenecks in the distributed training of deep neural networks.
We propose a magnitude-driven sparsification scheme, which addresses the non-convergence issue of SIGNSGD.
The proposed scheme is validated through experiments on Fashion-MNIST, CIFAR-10, and CIFAR-100 datasets.
arXiv Detail & Related papers (2023-02-19T17:42:35Z) - A Low Rank Promoting Prior for Unsupervised Contrastive Learning [108.91406719395417]
We construct a novel probabilistic graphical model that effectively incorporates the low rank promoting prior into the framework of contrastive learning.
Our hypothesis explicitly requires that all the samples belonging to the same instance class lie on the same subspace with small dimension.
Empirical evidences show that the proposed algorithm clearly surpasses the state-of-the-art approaches on multiple benchmarks.
arXiv Detail & Related papers (2021-08-05T15:58:25Z) - Reparameterized Variational Divergence Minimization for Stable Imitation [57.06909373038396]
We study the extent to which variations in the choice of probabilistic divergence may yield more performant ILO algorithms.
We contribute a re parameterization trick for adversarial imitation learning to alleviate the challenges of the promising $f$-divergence minimization framework.
Empirically, we demonstrate that our design choices allow for ILO algorithms that outperform baseline approaches and more closely match expert performance in low-dimensional continuous-control tasks.
arXiv Detail & Related papers (2020-06-18T19:04:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.