Let Go of Your Labels with Unsupervised Transfer
- URL: http://arxiv.org/abs/2406.07236v1
- Date: Tue, 11 Jun 2024 13:14:04 GMT
- Title: Let Go of Your Labels with Unsupervised Transfer
- Authors: Artyom Gadetsky, Yulun Jiang, Maria Brbic,
- Abstract summary: We show that fully unsupervised transfer emerges when searching for the labeling of a dataset.
We present TURTLE, a fully unsupervised method that effectively employs this guiding principle to uncover the underlying labeling.
We evaluate TURTLE on a diverse benchmark suite of 26 datasets and show that it achieves new state-of-the-art unsupervised performance.
- Score: 5.262577780347204
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Foundation vision-language models have enabled remarkable zero-shot transferability of the pre-trained representations to a wide range of downstream tasks. However, to solve a new task, zero-shot transfer still necessitates human guidance to define visual categories that appear in the data. Here, we show that fully unsupervised transfer emerges when searching for the labeling of a dataset that induces maximal margin classifiers in representation spaces of different foundation models. We present TURTLE, a fully unsupervised method that effectively employs this guiding principle to uncover the underlying labeling of a downstream dataset without any supervision and task-specific representation learning. We evaluate TURTLE on a diverse benchmark suite of 26 datasets and show that it achieves new state-of-the-art unsupervised performance. Furthermore, TURTLE, although being fully unsupervised, outperforms zero-shot transfer baselines on a wide range of datasets. In particular, TURTLE matches the average performance of CLIP zero-shot on 26 datasets by employing the same representation space, spanning a wide range of architectures and model sizes. By guiding the search for the underlying labeling using the representation spaces of two foundation models, TURTLE surpasses zero-shot transfer and unsupervised prompt tuning baselines, demonstrating the surprising power and effectiveness of unsupervised transfer.
Related papers
- ACTRESS: Active Retraining for Semi-supervised Visual Grounding [52.08834188447851]
A previous study, RefTeacher, makes the first attempt to tackle this task by adopting the teacher-student framework to provide pseudo confidence supervision and attention-based supervision.
This approach is incompatible with current state-of-the-art visual grounding models, which follow the Transformer-based pipeline.
Our paper proposes the ACTive REtraining approach for Semi-Supervised Visual Grounding, abbreviated as ACTRESS.
arXiv Detail & Related papers (2024-07-03T16:33:31Z) - DatUS^2: Data-driven Unsupervised Semantic Segmentation with Pre-trained
Self-supervised Vision Transformer [6.898332152137321]
Unsupervised dense semantic segmentation has not been explored as a downstream task.
This paper proposes a novel data-driven approach for unsupervised semantic segmentation as a downstream task.
Best version of DatUS2 outperforms the existing state-of-the-art method for the unsupervised dense semantic segmentation task.
arXiv Detail & Related papers (2024-01-23T14:53:32Z) - Weakly Supervised Video Individual CountingWeakly Supervised Video
Individual Counting [126.75545291243142]
Video Individual Counting aims to predict the number of unique individuals in a single video.
We introduce a weakly supervised VIC task, wherein trajectory labels are not provided.
In doing so, we devise an end-to-end trainable soft contrastive loss to drive the network to distinguish inflow, outflow, and the remaining.
arXiv Detail & Related papers (2023-12-10T16:12:13Z) - The Pursuit of Human Labeling: A New Perspective on Unsupervised
Learning [6.17147517649596]
We present HUME, a model-agnostic framework for inferring human labeling of a given dataset without any external supervision.
HUME utilizes this insight to guide the search over all possible labelings of a dataset to discover an underlying human labeling.
We show that the proposed optimization objective is strikingly well-correlated with the ground truth labeling of the dataset.
arXiv Detail & Related papers (2023-11-06T08:16:41Z) - Leveraging sparse and shared feature activations for disentangled
representation learning [112.22699167017471]
We propose to leverage knowledge extracted from a diversified set of supervised tasks to learn a common disentangled representation.
We validate our approach on six real world distribution shift benchmarks, and different data modalities.
arXiv Detail & Related papers (2023-04-17T01:33:24Z) - Self-Supervised Pre-Training for Transformer-Based Person
Re-Identification [54.55281692768765]
Transformer-based supervised pre-training achieves great performance in person re-identification (ReID)
Due to the domain gap between ImageNet and ReID datasets, it usually needs a larger pre-training dataset to boost the performance.
This work aims to mitigate the gap between the pre-training and ReID datasets from the perspective of data and model structure.
arXiv Detail & Related papers (2021-11-23T18:59:08Z) - Unsupervised Finetuning [80.58625921631506]
We propose two strategies to combine source and target data into unsupervised finetuning.
The motivation of the former strategy is to add a small portion of source data back to occupy their pretrained representation space.
The motivation of the latter strategy is to increase the data density and help learn more compact representation.
arXiv Detail & Related papers (2021-10-18T17:57:05Z) - Learning Invariant Representations across Domains and Tasks [81.30046935430791]
We propose a novel Task Adaptation Network (TAN) to solve this unsupervised task transfer problem.
In addition to learning transferable features via domain-adversarial training, we propose a novel task semantic adaptor that uses the learning-to-learn strategy to adapt the task semantics.
TAN significantly increases the recall and F1 score by 5.0% and 7.8% compared to recently strong baselines.
arXiv Detail & Related papers (2021-03-03T11:18:43Z) - Self-supervised driven consistency training for annotation efficient
histopathology image analysis [13.005873872821066]
Training a neural network with a large labeled dataset is still a dominant paradigm in computational histopathology.
We propose a self-supervised pretext task that harnesses the underlying multi-resolution contextual cues in histology whole-slide images to learn a powerful supervisory signal for unsupervised representation learning.
We also propose a new teacher-student semi-supervised consistency paradigm that learns to effectively transfer the pretrained representations to downstream tasks based on prediction consistency with the task-specific un-labeled data.
arXiv Detail & Related papers (2021-02-07T19:46:21Z) - Robust Disentanglement of a Few Factors at a Time [5.156484100374058]
We introduce population-based training (PBT) for improving consistency in training variational autoencoders (VAEs)
We then use Unsupervised Disentanglement Ranking (UDR) as an unsupervised to score models in our PBT-VAE training and show how models trained this way tend to consistently disentangle only a subset of the generative factors.
We show striking improvement in state-of-the-art unsupervised disentanglement performance and robustness across multiple datasets and metrics.
arXiv Detail & Related papers (2020-10-26T12:34:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.