Exploiting Diversity of Unlabeled Data for Label-Efficient
Semi-Supervised Active Learning
- URL: http://arxiv.org/abs/2207.12302v1
- Date: Mon, 25 Jul 2022 16:11:55 GMT
- Title: Exploiting Diversity of Unlabeled Data for Label-Efficient
Semi-Supervised Active Learning
- Authors: Felix Buchert, Nassir Navab, Seong Tae Kim
- Abstract summary: Active learning is a research area that addresses the issues of expensive labeling by selecting the most important samples for labeling.
We introduce a new diversity-based initial dataset selection algorithm to select the most informative set of samples for initial labeling in the active learning setting.
Also, we propose a novel active learning query strategy, which uses diversity-based sampling on consistency-based embeddings.
- Score: 57.436224561482966
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The availability of large labeled datasets is the key component for the
success of deep learning. However, annotating labels on large datasets is
generally time-consuming and expensive. Active learning is a research area that
addresses the issues of expensive labeling by selecting the most important
samples for labeling. Diversity-based sampling algorithms are known as integral
components of representation-based approaches for active learning. In this
paper, we introduce a new diversity-based initial dataset selection algorithm
to select the most informative set of samples for initial labeling in the
active learning setting. Self-supervised representation learning is used to
consider the diversity of samples in the initial dataset selection algorithm.
Also, we propose a novel active learning query strategy, which uses
diversity-based sampling on consistency-based embeddings. By considering the
consistency information with the diversity in the consistency-based embedding
scheme, the proposed method could select more informative samples for labeling
in the semi-supervised learning setting. Comparative experiments show that the
proposed method achieves compelling results on CIFAR-10 and Caltech-101
datasets compared with previous active learning approaches by utilizing the
diversity of unlabeled data.
Related papers
- BAL: Balancing Diversity and Novelty for Active Learning [53.289700543331925]
We introduce a novel framework, Balancing Active Learning (BAL), which constructs adaptive sub-pools to balance diverse and uncertain data.
Our approach outperforms all established active learning methods on widely recognized benchmarks by 1.20%.
arXiv Detail & Related papers (2023-12-26T08:14:46Z) - Deep Active Learning with Contrastive Learning Under Realistic Data Pool
Assumptions [2.578242050187029]
Active learning aims to identify the most informative data from an unlabeled data pool that enables a model to reach the desired accuracy rapidly.
Most existing active learning methods have been evaluated in an ideal setting where only samples relevant to the target task exist in an unlabeled data pool.
We introduce new active learning benchmarks that include ambiguous, task-irrelevant out-of-distribution as well as in-distribution samples.
arXiv Detail & Related papers (2023-03-25T10:46:10Z) - Pseudo-Labeled Auto-Curriculum Learning for Semi-Supervised Keypoint
Localization [88.74813798138466]
Localizing keypoints of an object is a basic visual problem.
Supervised learning of a keypoint localization network often requires a large amount of data.
We propose to automatically select reliable pseudo-labeled samples with a series of dynamic thresholds.
arXiv Detail & Related papers (2022-01-21T09:51:58Z) - Towards General and Efficient Active Learning [20.888364610175987]
Active learning aims to select the most informative samples to exploit limited annotation budgets.
We propose a novel general and efficient active learning (GEAL) method in this paper.
Our method can conduct data selection processes on different datasets with a single-pass inference of the same model.
arXiv Detail & Related papers (2021-12-15T08:35:28Z) - Multiple-criteria Based Active Learning with Fixed-size Determinantal
Point Processes [43.71112693633952]
We introduce a multiple-criteria based active learning algorithm, which incorporates three complementary criteria, i.e., informativeness, representativeness and diversity.
We show that our method performs significantly better and is more stable than other multiple-criteria based AL algorithms.
arXiv Detail & Related papers (2021-07-04T13:22:54Z) - Data Shapley Valuation for Efficient Batch Active Learning [21.76249748709411]
Active Data Shapley (ADS) is a filtering layer for batch active learning.
We show that ADS is particularly effective when the pool of unlabeled data exhibits real-world caveats.
arXiv Detail & Related papers (2021-04-16T18:53:42Z) - Minimax Active Learning [61.729667575374606]
Active learning aims to develop label-efficient algorithms by querying the most representative samples to be labeled by a human annotator.
Current active learning techniques either rely on model uncertainty to select the most uncertain samples or use clustering or reconstruction to choose the most diverse set of unlabeled examples.
We develop a semi-supervised minimax entropy-based active learning algorithm that leverages both uncertainty and diversity in an adversarial manner.
arXiv Detail & Related papers (2020-12-18T19:03:40Z) - Deep Active Learning for Sequence Labeling Based on Diversity and
Uncertainty in Gradient [5.33024001730262]
We show that the amount of labeled training data can be reduced using active learning when it incorporates both uncertainty and diversity in the sequence labeling task.
We examined the effects of our sequence-based approach by selecting weighted diverse in the gradient embedding approach across multiple tasks, datasets, models, and consistently outperform classic uncertainty-based sampling and diversity-based sampling.
arXiv Detail & Related papers (2020-11-27T06:03:27Z) - Adversarial Knowledge Transfer from Unlabeled Data [62.97253639100014]
We present a novel Adversarial Knowledge Transfer framework for transferring knowledge from internet-scale unlabeled data to improve the performance of a classifier.
An important novel aspect of our method is that the unlabeled source data can be of different classes from those of the labeled target data, and there is no need to define a separate pretext task.
arXiv Detail & Related papers (2020-08-13T08:04:27Z) - Similarity Search for Efficient Active Learning and Search of Rare
Concepts [78.5475382904847]
We improve the computational efficiency of active learning and search methods by restricting the candidate pool for labeling to the nearest neighbors of the currently labeled set.
Our approach achieved similar mean average precision and recall as the traditional global approach while reducing the computational cost of selection by up to three orders of magnitude, thus enabling web-scale active learning.
arXiv Detail & Related papers (2020-06-30T19:46:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.