The Pursuit of Human Labeling: A New Perspective on Unsupervised
Learning
- URL: http://arxiv.org/abs/2311.02940v1
- Date: Mon, 6 Nov 2023 08:16:41 GMT
- Title: The Pursuit of Human Labeling: A New Perspective on Unsupervised
Learning
- Authors: Artyom Gadetsky and Maria Brbic
- Abstract summary: We present HUME, a model-agnostic framework for inferring human labeling of a given dataset without any external supervision.
HUME utilizes this insight to guide the search over all possible labelings of a dataset to discover an underlying human labeling.
We show that the proposed optimization objective is strikingly well-correlated with the ground truth labeling of the dataset.
- Score: 6.17147517649596
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present HUME, a simple model-agnostic framework for inferring human
labeling of a given dataset without any external supervision. The key insight
behind our approach is that classes defined by many human labelings are
linearly separable regardless of the representation space used to represent a
dataset. HUME utilizes this insight to guide the search over all possible
labelings of a dataset to discover an underlying human labeling. We show that
the proposed optimization objective is strikingly well-correlated with the
ground truth labeling of the dataset. In effect, we only train linear
classifiers on top of pretrained representations that remain fixed during
training, making our framework compatible with any large pretrained and
self-supervised model. Despite its simplicity, HUME outperforms a supervised
linear classifier on top of self-supervised representations on the STL-10
dataset by a large margin and achieves comparable performance on the CIFAR-10
dataset. Compared to the existing unsupervised baselines, HUME achieves
state-of-the-art performance on four benchmark image classification datasets
including the large-scale ImageNet-1000 dataset. Altogether, our work provides
a fundamentally new view to tackle unsupervised learning by searching for
consistent labelings between different representation spaces.
Related papers
- Let Go of Your Labels with Unsupervised Transfer [5.262577780347204]
We show that fully unsupervised transfer emerges when searching for the labeling of a dataset.
We present TURTLE, a fully unsupervised method that effectively employs this guiding principle to uncover the underlying labeling.
We evaluate TURTLE on a diverse benchmark suite of 26 datasets and show that it achieves new state-of-the-art unsupervised performance.
arXiv Detail & Related papers (2024-06-11T13:14:04Z) - Labeling Indoor Scenes with Fusion of Out-of-the-Box Perception Models [4.157013247909771]
We propose to leverage the recent advancements in state-of-the-art models for bottom-up segmentation (SAM), object detection (Detic), and semantic segmentation (MaskFormer)
We aim to develop a cost-effective labeling approach to obtain pseudo-labels for semantic segmentation and object instance detection in indoor environments.
We demonstrate the effectiveness of the proposed approach on the Active Vision dataset and the ADE20K dataset.
arXiv Detail & Related papers (2023-11-17T21:58:26Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - In-Domain Self-Supervised Learning Improves Remote Sensing Image Scene
Classification [5.323049242720532]
Self-supervised learning has emerged as a promising approach for remote sensing image classification.
We present a study of different self-supervised pre-training strategies and evaluate their effect across 14 downstream datasets.
arXiv Detail & Related papers (2023-07-04T10:57:52Z) - Cluster-level pseudo-labelling for source-free cross-domain facial
expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z) - UniVIP: A Unified Framework for Self-Supervised Visual Pre-training [50.87603616476038]
We propose a novel self-supervised framework to learn versatile visual representations on either single-centric-object or non-iconic dataset.
Massive experiments show that UniVIP pre-trained on non-iconic COCO achieves state-of-the-art transfer performance.
Our method can also exploit single-centric-object dataset such as ImageNet and outperforms BYOL by 2.5% with the same pre-training epochs in linear probing.
arXiv Detail & Related papers (2022-03-14T10:04:04Z) - MSeg: A Composite Dataset for Multi-domain Semantic Segmentation [100.17755160696939]
We present MSeg, a composite dataset that unifies semantic segmentation datasets from different domains.
We reconcile the generalization and bring the pixel-level annotations into alignment by relabeling more than 220,000 object masks in more than 80,000 images.
A model trained on MSeg ranks first on the WildDash-v1 leaderboard for robust semantic segmentation, with no exposure to WildDash data during training.
arXiv Detail & Related papers (2021-12-27T16:16:35Z) - SelfHAR: Improving Human Activity Recognition through Self-training with
Unlabeled Data [9.270269467155547]
SelfHAR is a semi-supervised model that learns to leverage unlabeled datasets to complement small labeled datasets.
Our approach combines teacher-student self-training, which distills the knowledge of unlabeled and labeled datasets.
SelfHAR is data-efficient, reaching similar performance using up to 10 times less labeled data compared to supervised approaches.
arXiv Detail & Related papers (2021-02-11T15:40:35Z) - Semi-Automatic Data Annotation guided by Feature Space Projection [117.9296191012968]
We present a semi-automatic data annotation approach based on suitable feature space projection and semi-supervised label estimation.
We validate our method on the popular MNIST dataset and on images of human intestinal parasites with and without fecal impurities.
Our results demonstrate the added-value of visual analytics tools that combine complementary abilities of humans and machines for more effective machine learning.
arXiv Detail & Related papers (2020-07-27T17:03:50Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.