Unsupervised Embedding Quality Evaluation
- URL: http://arxiv.org/abs/2305.16562v2
- Date: Mon, 17 Jul 2023 22:41:18 GMT
- Title: Unsupervised Embedding Quality Evaluation
- Authors: Anton Tsitsulin, Marina Munkhoeva, Bryan Perozzi
- Abstract summary: SSL models are often unclear whether they will perform well when transferred to another domain.
Can we quantify how easy it is to linearly separate the data in a stable way?
We introduce one novel method based on recent advances in understanding the high-dimensional geometric structure of self-supervised learning.
- Score: 6.72542623686684
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unsupervised learning has recently significantly gained in popularity,
especially with deep learning-based approaches. Despite numerous successes and
approaching supervised-level performance on a variety of academic benchmarks,
it is still hard to train and evaluate SSL models in practice due to the
unsupervised nature of the problem. Even with networks trained in a supervised
fashion, it is often unclear whether they will perform well when transferred to
another domain.
Past works are generally limited to assessing the amount of information
contained in embeddings, which is most relevant for self-supervised learning of
deep neural networks. This works chooses to follow a different approach: can we
quantify how easy it is to linearly separate the data in a stable way? We
survey the literature and uncover three methods that could be potentially used
for evaluating quality of representations. We also introduce one novel method
based on recent advances in understanding the high-dimensional geometric
structure of self-supervised learning.
We conduct extensive experiments and study the properties of these metrics
and ones introduced in the previous work. Our results suggest that while there
is no free lunch, there are metrics that can robustly estimate embedding
quality in an unsupervised way.
Related papers
- No Regrets: Investigating and Improving Regret Approximations for Curriculum Discovery [53.08822154199948]
Unsupervised Environment Design (UED) methods have gained recent attention as their adaptive curricula promise to enable agents to be robust to in- and out-of-distribution tasks.
This work investigates how existing UED methods select training environments, focusing on task prioritisation metrics.
We develop a method that directly trains on scenarios with high learnability.
arXiv Detail & Related papers (2024-08-27T14:31:54Z) - A Study of Forward-Forward Algorithm for Self-Supervised Learning [65.268245109828]
We study the performance of forward-forward vs. backpropagation for self-supervised representation learning.
Our main finding is that while the forward-forward algorithm performs comparably to backpropagation during (self-supervised) training, the transfer performance is significantly lagging behind in all the studied settings.
arXiv Detail & Related papers (2023-09-21T10:14:53Z) - Weaker Than You Think: A Critical Look at Weakly Supervised Learning [30.160501243686863]
Weakly supervised learning is a popular approach for training machine learning models in low-resource settings.
We analyze diverse NLP datasets and tasks to ascertain when and why weakly supervised approaches work.
arXiv Detail & Related papers (2023-05-27T10:46:50Z) - Large-scale Pre-trained Models are Surprisingly Strong in Incremental Novel Class Discovery [76.63807209414789]
We challenge the status quo in class-iNCD and propose a learning paradigm where class discovery occurs continuously and truly unsupervisedly.
We propose simple baselines, composed of a frozen PTM backbone and a learnable linear classifier, that are not only simple to implement but also resilient under longer learning scenarios.
arXiv Detail & Related papers (2023-03-28T13:47:16Z) - Weighted Ensemble Self-Supervised Learning [67.24482854208783]
Ensembling has proven to be a powerful technique for boosting model performance.
We develop a framework that permits data-dependent weighted cross-entropy losses.
Our method outperforms both in multiple evaluation metrics on ImageNet-1K.
arXiv Detail & Related papers (2022-11-18T02:00:17Z) - On Generalizing Beyond Domains in Cross-Domain Continual Learning [91.56748415975683]
Deep neural networks often suffer from catastrophic forgetting of previously learned knowledge after learning a new task.
Our proposed approach learns new tasks under domain shift with accuracy boosts up to 10% on challenging datasets such as DomainNet and OfficeHome.
arXiv Detail & Related papers (2022-03-08T09:57:48Z) - How Well Do Self-Supervised Methods Perform in Cross-Domain Few-Shot
Learning? [17.56019071385342]
Cross-domain few-shot learning (CDFSL) remains a largely unsolved problem in the area of computer vision.
We investigate the role of self-supervised representation learning in the context of CDFSL via a thorough evaluation of existing methods.
We find that representations extracted from self-supervised methods exhibit stronger robustness than the supervised method.
arXiv Detail & Related papers (2022-02-18T04:03:53Z) - Simple Control Baselines for Evaluating Transfer Learning [1.0499611180329802]
We share an evaluation standard that aims to quantify and communicate transfer learning performance.
We provide an example empirical study investigating a few basic questions about self-supervised learning.
arXiv Detail & Related papers (2022-02-07T17:26:26Z) - Online Continual Learning with Natural Distribution Shifts: An Empirical
Study with Visual Data [101.6195176510611]
"Online" continual learning enables evaluating both information retention and online learning efficacy.
In online continual learning, each incoming small batch of data is first used for testing and then added to the training set, making the problem truly online.
We introduce a new benchmark for online continual visual learning that exhibits large scale and natural distribution shifts.
arXiv Detail & Related papers (2021-08-20T06:17:20Z) - A Survey on Contrastive Self-supervised Learning [0.0]
Self-supervised learning has gained popularity because of its ability to avoid the cost of annotating large-scale datasets.
Contrastive learning has recently become a dominant component in self-supervised learning methods for computer vision, natural language processing (NLP), and other domains.
This paper provides an extensive review of self-supervised methods that follow the contrastive approach.
arXiv Detail & Related papers (2020-10-31T21:05:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.