Preventing Dimensional Collapse in Self-Supervised Learning via Orthogonality Regularization
- URL: http://arxiv.org/abs/2411.00392v1
- Date: Fri, 01 Nov 2024 06:39:18 GMT
- Title: Preventing Dimensional Collapse in Self-Supervised Learning via Orthogonality Regularization
- Authors: Junlin He, Jinxiao Du, Wei Ma,
- Abstract summary: Self-supervised learning (SSL) has rapidly advanced in recent years, approaching the performance of its supervised counterparts.
dimensional collapse, where a few large eigenvalues dominate the eigenspace, poses a significant obstacle for SSL.
- Score: 9.823816643319448
- License:
- Abstract: Self-supervised learning (SSL) has rapidly advanced in recent years, approaching the performance of its supervised counterparts through the extraction of representations from unlabeled data. However, dimensional collapse, where a few large eigenvalues dominate the eigenspace, poses a significant obstacle for SSL. When dimensional collapse occurs on features (e.g. hidden features and representations), it prevents features from representing the full information of the data; when dimensional collapse occurs on weight matrices, their filters are self-related and redundant, limiting their expressive power. Existing studies have predominantly concentrated on the dimensional collapse of representations, neglecting whether this can sufficiently prevent the dimensional collapse of the weight matrices and hidden features. To this end, we first time propose a mitigation approach employing orthogonal regularization (OR) across the encoder, targeting both convolutional and linear layers during pretraining. OR promotes orthogonality within weight matrices, thus safeguarding against the dimensional collapse of weight matrices, hidden features, and representations. Our empirical investigations demonstrate that OR significantly enhances the performance of SSL methods across diverse benchmarks, yielding consistent gains with both CNNs and Transformer-based architectures.
Related papers
- Implicit Regularization of Gradient Flow on One-Layer Softmax Attention [10.060496091806694]
We study gradient flow on the exponential loss for a classification problem with a one-layer softmax attention model.
Under a separability assumption on the data, we show that when gradient flow achieves the minimal loss value, it further implicitly minimizes the nuclear norm of the product of the key and query weight matrices.
arXiv Detail & Related papers (2024-03-13T17:02:27Z) - Data-freeWeight Compress and Denoise for Large Language Models [101.53420111286952]
We propose a novel approach termed Data-free Joint Rank-k Approximation for compressing the parameter matrices.
We achieve a model pruning of 80% parameters while retaining 93.43% of the original performance without any calibration data.
arXiv Detail & Related papers (2024-02-26T05:51:47Z) - WERank: Towards Rank Degradation Prevention for Self-Supervised Learning
Using Weight Regularization [5.484161990886851]
We propose WERank, a new regularizer on the weight parameters of the network to prevent rank degeneration at different layers of the network.
We empirically demonstrate that WERank is effective in helping BYOL to achieve higher rank during SSL pre-training and consequently downstream accuracy during evaluation probing.
arXiv Detail & Related papers (2024-02-14T21:29:28Z) - Scalable manifold learning by uniform landmark sampling and constrained
locally linear embedding [0.6144680854063939]
We propose a scalable manifold learning (scML) method that can manipulate large-scale and high-dimensional data in an efficient manner.
We empirically validated the effectiveness of scML on synthetic datasets and real-world benchmarks of different types.
scML scales well with increasing data sizes and embedding dimensions, and exhibits promising performance in preserving the global structure.
arXiv Detail & Related papers (2024-01-02T08:43:06Z) - Disentanglement via Latent Quantization [60.37109712033694]
In this work, we construct an inductive bias towards encoding to and decoding from an organized latent space.
We demonstrate the broad applicability of this approach by adding it to both basic data-re (vanilla autoencoder) and latent-reconstructing (InfoGAN) generative models.
arXiv Detail & Related papers (2023-05-28T06:30:29Z) - Preventing Dimensional Collapse of Incomplete Multi-View Clustering via
Direct Contrastive Learning [0.14999444543328289]
We propose a novel incomplete multi-view contrastive clustering framework.
It effectively avoids dimensional collapse without relying on projection heads.
It achieves state-of-the-art clustering results on 5 public datasets.
arXiv Detail & Related papers (2023-03-22T00:21:50Z) - Sufficient dimension reduction for feature matrices [3.04585143845864]
We propose a method called principal support matrix machine (PSMM) for the matrix sufficient dimension reduction.
Our numerical analysis demonstrates that the PSMM outperforms existing methods and has strong interpretability in real data applications.
arXiv Detail & Related papers (2023-03-07T23:16:46Z) - DBA: Efficient Transformer with Dynamic Bilinear Low-Rank Attention [53.02648818164273]
We present an efficient yet effective attention mechanism, namely the Dynamic Bilinear Low-Rank Attention (DBA)
DBA compresses the sequence length by input-sensitive dynamic projection matrices and achieves linear time and space complexity.
Experiments over tasks with diverse sequence length conditions show that DBA achieves state-of-the-art performance.
arXiv Detail & Related papers (2022-11-24T03:06:36Z) - High-dimensional separability for one- and few-shot learning [58.8599521537]
This work is driven by a practical question, corrections of Artificial Intelligence (AI) errors.
Special external devices, correctors, are developed. They should provide quick and non-iterative system fix without modification of a legacy AI system.
New multi-correctors of AI systems are presented and illustrated with examples of predicting errors and learning new classes of objects by a deep convolutional neural network.
arXiv Detail & Related papers (2021-06-28T14:58:14Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - Eigendecomposition-Free Training of Deep Networks for Linear
Least-Square Problems [107.3868459697569]
We introduce an eigendecomposition-free approach to training a deep network.
We show that our approach is much more robust than explicit differentiation of the eigendecomposition.
Our method has better convergence properties and yields state-of-the-art results.
arXiv Detail & Related papers (2020-04-15T04:29:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.