Pool-Based Unsupervised Active Learning for Regression Using Iterative
Representativeness-Diversity Maximization (iRDM)
- URL: http://arxiv.org/abs/2003.07658v2
- Date: Tue, 31 Mar 2020 22:07:38 GMT
- Title: Pool-Based Unsupervised Active Learning for Regression Using Iterative
Representativeness-Diversity Maximization (iRDM)
- Authors: Ziang Liu, Xue Jiang, Hanbin Luo, Weili Fang, Jiajing Liu, and Dongrui
Wu
- Abstract summary: Active learning (AL) selects the most beneficial unlabeled samples to label, and hence a better machine learning model can be trained from the same number of labeled samples.
Most existing active learning for regression (ALR) approaches are supervised, which means the sampling process must use some label information.
We propose a novel unsupervised ALR approach, iterative representativeness-diversity (iRDM) to balance the representativeness and the diversity of the selected samples.
- Score: 22.037639625586667
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Active learning (AL) selects the most beneficial unlabeled samples to label,
and hence a better machine learning model can be trained from the same number
of labeled samples. Most existing active learning for regression (ALR)
approaches are supervised, which means the sampling process must use some label
information, or an existing regression model. This paper considers completely
unsupervised ALR, i.e., how to select the samples to label without knowing any
true label information. We propose a novel unsupervised ALR approach, iterative
representativeness-diversity maximization (iRDM), to optimally balance the
representativeness and the diversity of the selected samples. Experiments on 12
datasets from various domains demonstrated its effectiveness. Our iRDM can be
applied to both linear regression and kernel regression, and it even
significantly outperforms supervised ALR when the number of labeled samples is
small.
Related papers
- Exploring Beyond Logits: Hierarchical Dynamic Labeling Based on Embeddings for Semi-Supervised Classification [49.09505771145326]
We propose a Hierarchical Dynamic Labeling (HDL) algorithm that does not depend on model predictions and utilizes image embeddings to generate sample labels.
Our approach has the potential to change the paradigm of pseudo-label generation in semi-supervised learning.
arXiv Detail & Related papers (2024-04-26T06:00:27Z) - MyriadAL: Active Few Shot Learning for Histopathology [10.652626309100889]
We introduce an active few shot learning framework, Myriad Active Learning (MAL)
MAL includes a contrastive-learning encoder, pseudo-label generation, and novel query sample selection in the loop.
Experiments on two public histopathology datasets show that MAL has superior test accuracy, macro F1-score, and label efficiency compared to prior works.
arXiv Detail & Related papers (2023-10-24T20:08:15Z) - Pareto Optimization for Active Learning under Out-of-Distribution Data
Scenarios [79.02009938011447]
We propose a sampling scheme, which selects optimal subsets of unlabeled samples with fixed batch size from the unlabeled data pool.
Experimental results show its effectiveness on both classical Machine Learning (ML) and Deep Learning (DL) tasks.
arXiv Detail & Related papers (2022-07-04T04:11:44Z) - One Positive Label is Sufficient: Single-Positive Multi-Label Learning
with Label Enhancement [71.9401831465908]
We investigate single-positive multi-label learning (SPMLL) where each example is annotated with only one relevant label.
A novel method named proposed, i.e., Single-positive MultI-label learning with Label Enhancement, is proposed.
Experiments on benchmark datasets validate the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-06-01T14:26:30Z) - A Lagrangian Duality Approach to Active Learning [119.36233726867992]
We consider the batch active learning problem, where only a subset of the training data is labeled.
We formulate the learning problem using constrained optimization, where each constraint bounds the performance of the model on labeled samples.
We show, via numerical experiments, that our proposed approach performs similarly to or better than state-of-the-art active learning methods.
arXiv Detail & Related papers (2022-02-08T19:18:49Z) - Improving Sample and Feature Selection with Principal Covariates
Regression [0.0]
We focus on two popular sub-selection schemes which have been applied to this end.
We show that incorporating target information provides selections that perform better in supervised tasks.
We also show that incorporating aspects of simple supervised learning models can improve the accuracy of more complex models.
arXiv Detail & Related papers (2020-12-22T18:52:06Z) - Minimax Active Learning [61.729667575374606]
Active learning aims to develop label-efficient algorithms by querying the most representative samples to be labeled by a human annotator.
Current active learning techniques either rely on model uncertainty to select the most uncertain samples or use clustering or reconstruction to choose the most diverse set of unlabeled examples.
We develop a semi-supervised minimax entropy-based active learning algorithm that leverages both uncertainty and diversity in an adversarial manner.
arXiv Detail & Related papers (2020-12-18T19:03:40Z) - Active Learning under Label Shift [80.65643075952639]
We introduce a "medial distribution" to incorporate a tradeoff between importance and class-balanced sampling.
We prove sample complexity and generalization guarantees for Mediated Active Learning under Label Shift (MALLS)
We empirically demonstrate MALLS scales to high-dimensional datasets and can reduce the sample complexity of active learning by 60% in deep active learning tasks.
arXiv Detail & Related papers (2020-07-16T17:30:02Z) - Deep Active Learning via Open Set Recognition [0.0]
In many applications, data is easy to acquire but expensive and time-consuming to label prominent examples.
We formulate active learning as an open-set recognition problem.
Unlike current active learning methods, our algorithm can learn tasks without the need for task labels.
arXiv Detail & Related papers (2020-07-04T22:09:17Z) - Integrating Informativeness, Representativeness and Diversity in
Pool-Based Sequential Active Learning for Regression [29.321275647107928]
It optimally selects the best few samples to label, so that a better machine learning model can be trained from the same number of labeled samples.
Three essential criteria -- informativeness, representativeness, and diversity -- have been proposed for regression problems.
We propose three new ALR approaches, with different strategies for integrating the three criteria.
arXiv Detail & Related papers (2020-03-26T08:10:58Z) - Unsupervised Pool-Based Active Learning for Linear Regression [29.321275647107928]
This paper studies unsupervised pool-based AL for linear regression problems.
We propose a novel AL approach that considers simultaneously the informativeness, representativeness, and diversity, three essential criteria in AL.
arXiv Detail & Related papers (2020-01-14T20:00:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.