Extending Contrastive Learning to Unsupervised Coreset Selection
- URL: http://arxiv.org/abs/2103.03574v1
- Date: Fri, 5 Mar 2021 10:21:51 GMT
- Title: Extending Contrastive Learning to Unsupervised Coreset Selection
- Authors: Jeongwoo Ju, Heechul Jung, Yoonju Oh, Junmo Kim
- Abstract summary: We propose an unsupervised way of selecting a core-set entirely unlabeled.
We use two leading methods for contrastive learning.
Compared with existing coreset selection methods with labels, our approach reduced the cost associated with human annotation.
- Score: 26.966136750754732
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Self-supervised contrastive learning offers a means of learning informative
features from a pool of unlabeled data. In this paper, we delve into another
useful approach -- providing a way of selecting a core-set that is entirely
unlabeled. In this regard, contrastive learning, one of a large number of
self-supervised methods, was recently proposed and has consistently delivered
the highest performance. This prompted us to choose two leading methods for
contrastive learning: the simple framework for contrastive learning of visual
representations (SimCLR) and the momentum contrastive (MoCo) learning
framework. We calculated the cosine similarities for each example of an epoch
for the entire duration of the contrastive learning process and subsequently
accumulated the cosine-similarity values to obtain the coreset score. Our
assumption was that an sample with low similarity would likely behave as a
coreset. Compared with existing coreset selection methods with labels, our
approach reduced the cost associated with human annotation. The unsupervised
method implemented in this study for coreset selection obtained improved
results over a randomly chosen subset, and were comparable to existing
supervised coreset selection on various classification datasets (e.g., CIFAR,
SVHN, and QMNIST).
Related papers
- Simple-Sampling and Hard-Mixup with Prototypes to Rebalance Contrastive Learning for Text Classification [11.072083437769093]
We propose a novel model named SharpReCL for imbalanced text classification tasks.
Our model even outperforms popular large language models across several datasets.
arXiv Detail & Related papers (2024-05-19T11:33:49Z) - CKD: Contrastive Knowledge Distillation from A Sample-wise Perspective [48.99488315273868]
We present a contrastive knowledge distillation approach, which can be formulated as a sample-wise alignment problem with intra- and inter-sample constraints.
Our method minimizes logit differences within the same sample by considering their numerical values.
We conduct comprehensive experiments on three datasets including CIFAR-100, ImageNet-1K, and MS COCO.
arXiv Detail & Related papers (2024-04-22T11:52:40Z) - One-bit Supervision for Image Classification: Problem, Solution, and
Beyond [114.95815360508395]
This paper presents one-bit supervision, a novel setting of learning with fewer labels, for image classification.
We propose a multi-stage training paradigm and incorporate negative label suppression into an off-the-shelf semi-supervised learning algorithm.
In multiple benchmarks, the learning efficiency of the proposed approach surpasses that using full-bit, semi-supervised supervision.
arXiv Detail & Related papers (2023-11-26T07:39:00Z) - Probabilistic Bilevel Coreset Selection [24.874967723659022]
We propose a continuous probabilistic bilevel formulation of coreset selection by learning a probablistic weight for each training sample.
We develop an efficient solver to the bilevel optimization problem via unbiased policy gradient without trouble of implicit differentiation.
arXiv Detail & Related papers (2023-01-24T09:37:00Z) - Learning by Sorting: Self-supervised Learning with Group Ordering
Constraints [75.89238437237445]
This paper proposes a new variation of the contrastive learning objective, Group Ordering Constraints (GroCo)
It exploits the idea of sorting the distances of positive and negative pairs and computing the respective loss based on how many positive pairs have a larger distance than the negative pairs, and thus are not ordered correctly.
We evaluate the proposed formulation on various self-supervised learning benchmarks and show that it not only leads to improved results compared to vanilla contrastive learning but also shows competitive performance to comparable methods in linear probing and outperforms current methods in k-NN performance.
arXiv Detail & Related papers (2023-01-05T11:17:55Z) - Rethinking Clustering-Based Pseudo-Labeling for Unsupervised
Meta-Learning [146.11600461034746]
Method for unsupervised meta-learning, CACTUs, is a clustering-based approach with pseudo-labeling.
This approach is model-agnostic and can be combined with supervised algorithms to learn from unlabeled data.
We prove that the core reason for this is lack of a clustering-friendly property in the embedding space.
arXiv Detail & Related papers (2022-09-27T19:04:36Z) - Exploiting Diversity of Unlabeled Data for Label-Efficient
Semi-Supervised Active Learning [57.436224561482966]
Active learning is a research area that addresses the issues of expensive labeling by selecting the most important samples for labeling.
We introduce a new diversity-based initial dataset selection algorithm to select the most informative set of samples for initial labeling in the active learning setting.
Also, we propose a novel active learning query strategy, which uses diversity-based sampling on consistency-based embeddings.
arXiv Detail & Related papers (2022-07-25T16:11:55Z) - DeepCore: A Comprehensive Library for Coreset Selection in Deep Learning [3.897574108827803]
We provide an empirical study on popular coreset selection methods on CIFAR10 and ImageNet datasets.
Although some methods perform better in certain experiment settings, random selection is still a strong baseline.
arXiv Detail & Related papers (2022-04-18T18:14:30Z) - Self-Training: A Survey [5.772546394254112]
Semi-supervised algorithms aim to learn prediction functions from a small set of labeled observations and a large set of unlabeled observations.
Among the existing techniques, self-training methods have undoubtedly attracted greater attention in recent years.
We present self-training methods for binary and multi-class classification; as well as their variants and two related approaches.
arXiv Detail & Related papers (2022-02-24T11:40:44Z) - Neighborhood Contrastive Learning for Novel Class Discovery [79.14767688903028]
We build a new framework, named Neighborhood Contrastive Learning, to learn discriminative representations that are important to clustering performance.
We experimentally demonstrate that these two ingredients significantly contribute to clustering performance and lead our model to outperform state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2021-06-20T17:34:55Z) - Uncovering Coresets for Classification With Multi-Objective Evolutionary
Algorithms [0.8057006406834467]
A coreset is a subset of the training set, using which a machine learning algorithm obtains performances similar to what it would deliver if trained over the whole original data.
A novel approach is presented: candidate corsets are iteratively optimized, adding and removing samples.
A multi-objective evolutionary algorithm is used to minimize simultaneously the number of points in the set and the classification error.
arXiv Detail & Related papers (2020-02-20T09:59:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.