Integrating Informativeness, Representativeness and Diversity in
Pool-Based Sequential Active Learning for Regression
- URL: http://arxiv.org/abs/2003.11786v1
- Date: Thu, 26 Mar 2020 08:10:58 GMT
- Title: Integrating Informativeness, Representativeness and Diversity in
Pool-Based Sequential Active Learning for Regression
- Authors: Ziang Liu and Dongrui Wu
- Abstract summary: It optimally selects the best few samples to label, so that a better machine learning model can be trained from the same number of labeled samples.
Three essential criteria -- informativeness, representativeness, and diversity -- have been proposed for regression problems.
We propose three new ALR approaches, with different strategies for integrating the three criteria.
- Score: 29.321275647107928
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In many real-world machine learning applications, unlabeled samples are easy
to obtain, but it is expensive and/or time-consuming to label them. Active
learning is a common approach for reducing this data labeling effort. It
optimally selects the best few samples to label, so that a better machine
learning model can be trained from the same number of labeled samples. This
paper considers active learning for regression (ALR) problems. Three essential
criteria -- informativeness, representativeness, and diversity -- have been
proposed for ALR. However, very few approaches in the literature have
considered all three of them simultaneously. We propose three new ALR
approaches, with different strategies for integrating the three criteria.
Extensive experiments on 12 datasets in various domains demonstrated their
effectiveness.
Related papers
- Is margin all you need? An extensive empirical study of active learning
on tabular data [66.18464006872345]
We analyze the performance of a variety of active learning algorithms on 69 real-world datasets from the OpenML-CC18 benchmark.
Surprisingly, we find that the classical margin sampling technique matches or outperforms all others, including current state-of-art.
arXiv Detail & Related papers (2022-10-07T21:18:24Z) - Exploiting Diversity of Unlabeled Data for Label-Efficient
Semi-Supervised Active Learning [57.436224561482966]
Active learning is a research area that addresses the issues of expensive labeling by selecting the most important samples for labeling.
We introduce a new diversity-based initial dataset selection algorithm to select the most informative set of samples for initial labeling in the active learning setting.
Also, we propose a novel active learning query strategy, which uses diversity-based sampling on consistency-based embeddings.
arXiv Detail & Related papers (2022-07-25T16:11:55Z) - One Positive Label is Sufficient: Single-Positive Multi-Label Learning
with Label Enhancement [71.9401831465908]
We investigate single-positive multi-label learning (SPMLL) where each example is annotated with only one relevant label.
A novel method named proposed, i.e., Single-positive MultI-label learning with Label Enhancement, is proposed.
Experiments on benchmark datasets validate the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-06-01T14:26:30Z) - A Lagrangian Duality Approach to Active Learning [119.36233726867992]
We consider the batch active learning problem, where only a subset of the training data is labeled.
We formulate the learning problem using constrained optimization, where each constraint bounds the performance of the model on labeled samples.
We show, via numerical experiments, that our proposed approach performs similarly to or better than state-of-the-art active learning methods.
arXiv Detail & Related papers (2022-02-08T19:18:49Z) - Improving Contrastive Learning on Imbalanced Seed Data via Open-World
Sampling [96.8742582581744]
We present an open-world unlabeled data sampling framework called Model-Aware K-center (MAK)
MAK follows three simple principles: tailness, proximity, and diversity.
We demonstrate that MAK can consistently improve both the overall representation quality and the class balancedness of the learned features.
arXiv Detail & Related papers (2021-11-01T15:09:41Z) - Active Learning in Incomplete Label Multiple Instance Multiple Label
Learning [17.5720245903743]
We propose a novel bag-class pair based approach for active learning in the MIML setting.
Our approach is based on a discriminative graphical model with efficient and exact inference.
arXiv Detail & Related papers (2021-07-22T17:01:28Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z) - Deep Active Learning via Open Set Recognition [0.0]
In many applications, data is easy to acquire but expensive and time-consuming to label prominent examples.
We formulate active learning as an open-set recognition problem.
Unlike current active learning methods, our algorithm can learn tasks without the need for task labels.
arXiv Detail & Related papers (2020-07-04T22:09:17Z) - Pool-Based Unsupervised Active Learning for Regression Using Iterative
Representativeness-Diversity Maximization (iRDM) [22.037639625586667]
Active learning (AL) selects the most beneficial unlabeled samples to label, and hence a better machine learning model can be trained from the same number of labeled samples.
Most existing active learning for regression (ALR) approaches are supervised, which means the sampling process must use some label information.
We propose a novel unsupervised ALR approach, iterative representativeness-diversity (iRDM) to balance the representativeness and the diversity of the selected samples.
arXiv Detail & Related papers (2020-03-17T12:20:46Z) - Unsupervised Pool-Based Active Learning for Linear Regression [29.321275647107928]
This paper studies unsupervised pool-based AL for linear regression problems.
We propose a novel AL approach that considers simultaneously the informativeness, representativeness, and diversity, three essential criteria in AL.
arXiv Detail & Related papers (2020-01-14T20:00:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.