Rethinking The Uniformity Metric in Self-Supervised Learning
- URL: http://arxiv.org/abs/2403.00642v2
- Date: Fri, 26 Apr 2024 08:24:11 GMT
- Title: Rethinking The Uniformity Metric in Self-Supervised Learning
- Authors: Xianghong Fang, Jian Li, Qiang Sun, Benyou Wang,
- Abstract summary: Uniformity plays an important role in evaluating learned representations, providing insights into self-supervised learning.
We find that the uniformity metric proposed by citetWang 2020UnderstandingCR fails to satisfy the majority of these properties.
To overcome these limitations, we introduce a new uniformity metric based on the Wasserstein distance.
- Score: 20.040558579232105
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Uniformity plays an important role in evaluating learned representations, providing insights into self-supervised learning. In our quest for effective uniformity metrics, we pinpoint four principled properties that such metrics should possess. Namely, an effective uniformity metric should remain invariant to instance permutations and sample replications while accurately capturing feature redundancy and dimensional collapse. Surprisingly, we find that the uniformity metric proposed by \citet{Wang2020UnderstandingCR} fails to satisfy the majority of these properties. Specifically, their metric is sensitive to sample replications, and can not account for feature redundancy and dimensional collapse correctly. To overcome these limitations, we introduce a new uniformity metric based on the Wasserstein distance, which satisfies all the aforementioned properties. Integrating this new metric in existing self-supervised learning methods effectively mitigates dimensional collapse and consistently improves their performance on downstream tasks involving CIFAR-10 and CIFAR-100 datasets. Code is available at \url{https://github.com/statsle/WassersteinSSL}.
Related papers
- Is All Learning (Natural) Gradient Descent? [1.3654846342364308]
We show that a class of effective learning rules can be as natural gradient descent with respect to a suitably defined loss function and metric.
We also demonstrate that these metrics have a canonical form and identify several optimal ones, including the metric that achieves the minimum possible condition number.
arXiv Detail & Related papers (2024-09-24T19:41:08Z) - Gradient Boosting Mapping for Dimensionality Reduction and Feature Extraction [2.778647101651566]
A fundamental problem in supervised learning is to find a good set of features or distance measures.
We propose a supervised dimensionality reduction method, where the outputs of weak learners define the embedding.
We show that the embedding coordinates provide better features for the supervised learning task.
arXiv Detail & Related papers (2024-05-14T10:23:57Z) - Rethinking Classifier Re-Training in Long-Tailed Recognition: A Simple
Logits Retargeting Approach [102.0769560460338]
We develop a simple logits approach (LORT) without the requirement of prior knowledge of the number of samples per class.
Our method achieves state-of-the-art performance on various imbalanced datasets, including CIFAR100-LT, ImageNet-LT, and iNaturalist 2018.
arXiv Detail & Related papers (2024-03-01T03:27:08Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - MetricOpt: Learning to Optimize Black-Box Evaluation Metrics [21.608384691401238]
We study the problem of optimizing arbitrary non-differentiable task evaluation metrics such as misclassification rate and recall.
Our method, named MetricOpt, operates in a black-box setting where the computational details of the target metric are unknown.
We achieve this by learning a differentiable value function, which maps compact task-specific model parameters to metric observations.
arXiv Detail & Related papers (2021-04-21T16:50:01Z) - Attentional-Biased Stochastic Gradient Descent [74.49926199036481]
We present a provable method (named ABSGD) for addressing the data imbalance or label noise problem in deep learning.
Our method is a simple modification to momentum SGD where we assign an individual importance weight to each sample in the mini-batch.
ABSGD is flexible enough to combine with other robust losses without any additional cost.
arXiv Detail & Related papers (2020-12-13T03:41:52Z) - ReMP: Rectified Metric Propagation for Few-Shot Learning [67.96021109377809]
A rectified metric space is learned to maintain the metric consistency from training to testing.
Numerous analyses indicate that a simple modification of the objective can yield substantial performance gains.
The proposed ReMP is effective and efficient, and outperforms the state of the arts on various standard few-shot learning datasets.
arXiv Detail & Related papers (2020-12-02T00:07:53Z) - ECML: An Ensemble Cascade Metric Learning Mechanism towards Face
Verification [50.137924223702264]
In particular, hierarchical metric learning is executed in the cascade way to alleviate underfitting.
Considering the feature distribution characteristics of faces, a robust Mahalanobis metric learning method (RMML) with closed-form solution is additionally proposed.
EC-RMML is superior to state-of-the-art metric learning methods for face verification.
arXiv Detail & Related papers (2020-07-11T08:47:07Z) - Deep Dimension Reduction for Supervised Representation Learning [51.10448064423656]
We propose a deep dimension reduction approach to learning representations with essential characteristics.
The proposed approach is a nonparametric generalization of the sufficient dimension reduction method.
We show that the estimated deep nonparametric representation is consistent in the sense that its excess risk converges to zero.
arXiv Detail & Related papers (2020-06-10T14:47:43Z) - Learning Flat Latent Manifolds with VAEs [16.725880610265378]
We propose an extension to the framework of variational auto-encoders, where the Euclidean metric is a proxy for the similarity between data points.
We replace the compact prior typically used in variational auto-encoders with a recently presented, more expressive hierarchical one.
We evaluate our method on a range of data-sets, including a video-tracking benchmark.
arXiv Detail & Related papers (2020-02-12T09:54:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.