Achieving Minimax Rates in Pool-Based Batch Active Learning
- URL: http://arxiv.org/abs/2202.05448v1
- Date: Fri, 11 Feb 2022 04:55:45 GMT
- Title: Achieving Minimax Rates in Pool-Based Batch Active Learning
- Authors: Claudio Gentile, Zhilei Wang, Tong Zhang
- Abstract summary: We consider a batch active learning scenario where the learner adaptively issues batches of points to a labeling oracle.
In this paper we propose a solution which requires a careful trade off between the informativeness of the queried points and their diversity.
- Score: 26.12124106759262
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider a batch active learning scenario where the learner adaptively
issues batches of points to a labeling oracle. Sampling labels in batches is
highly desirable in practice due to the smaller number of interactive rounds
with the labeling oracle (often human beings). However, batch active learning
typically pays the price of a reduced adaptivity, leading to suboptimal
results. In this paper we propose a solution which requires a careful trade off
between the informativeness of the queried points and their diversity. We
theoretically investigate batch active learning in the practically relevant
scenario where the unlabeled pool of data is available beforehand (pool-based
active learning). We analyze a novel stage-wise greedy algorithm and show that,
as a function of the label complexity, the excess risk of this algorithm
operating in the realizable setting for which we prove matches the known
minimax rates in standard statistical learning settings. Our results also
exhibit a mild dependence on the batch size. These are the first theoretical
results that employ careful trade offs between informativeness and diversity to
rigorously quantify the statistical performance of batch active learning in the
pool-based scenario.
Related papers
- Probably Approximately Precision and Recall Learning [62.912015491907994]
Precision and Recall are foundational metrics in machine learning.
One-sided feedback--where only positive examples are observed during training--is inherent in many practical problems.
We introduce a PAC learning framework where each hypothesis is represented by a graph, with edges indicating positive interactions.
arXiv Detail & Related papers (2024-11-20T04:21:07Z) - BatchGFN: Generative Flow Networks for Batch Active Learning [80.73649229919454]
BatchGFN is a novel approach for pool-based active learning that uses generative flow networks to sample sets of data points proportional to a batch reward.
We show our approach enables principled sampling near-optimal utility batches at inference time with a single forward pass per point in the batch in toy regression problems.
arXiv Detail & Related papers (2023-06-26T20:41:36Z) - Deep Active Learning with Contrastive Learning Under Realistic Data Pool
Assumptions [2.578242050187029]
Active learning aims to identify the most informative data from an unlabeled data pool that enables a model to reach the desired accuracy rapidly.
Most existing active learning methods have been evaluated in an ideal setting where only samples relevant to the target task exist in an unlabeled data pool.
We introduce new active learning benchmarks that include ambiguous, task-irrelevant out-of-distribution as well as in-distribution samples.
arXiv Detail & Related papers (2023-03-25T10:46:10Z) - Batch Active Learning at Scale [39.26441165274027]
Batch active learning, which adaptively issues batched queries to a labeling oracle, is a common approach for addressing this problem.
In this work, we analyze an efficient active learning algorithm, which focuses on the large batch setting.
We show that our sampling method, which combines notions of uncertainty and diversity, easily scales to batch sizes (100K-1M) several orders of magnitude larger than used in previous studies.
arXiv Detail & Related papers (2021-07-29T18:14:05Z) - Can Active Learning Preemptively Mitigate Fairness Issues? [66.84854430781097]
dataset bias is one of the prevailing causes of unfairness in machine learning.
We study whether models trained with uncertainty-based ALs are fairer in their decisions with respect to a protected class.
We also explore the interaction of algorithmic fairness methods such as gradient reversal (GRAD) and BALD.
arXiv Detail & Related papers (2021-04-14T14:20:22Z) - Semi-supervised Batch Active Learning via Bilevel Optimization [89.37476066973336]
We formulate our approach as a data summarization problem via bilevel optimization.
We show that our method is highly effective in keyword detection tasks in the regime when only few labeled samples are available.
arXiv Detail & Related papers (2020-10-19T16:53:24Z) - ALdataset: a benchmark for pool-based active learning [1.9308522511657449]
Active learning (AL) is a subfield of machine learning (ML) in which a learning algorithm could achieve good accuracy with less training samples by interactively querying a user/oracle to label new data points.
Pool-based AL is well-motivated in many ML tasks, where unlabeled data is abundant, but their labels are hard to obtain.
We present experiment results for various active learning strategies, both recently proposed and classic highly-cited methods, and draw insights from the results.
arXiv Detail & Related papers (2020-10-16T04:37:29Z) - Active Learning under Label Shift [80.65643075952639]
We introduce a "medial distribution" to incorporate a tradeoff between importance and class-balanced sampling.
We prove sample complexity and generalization guarantees for Mediated Active Learning under Label Shift (MALLS)
We empirically demonstrate MALLS scales to high-dimensional datasets and can reduce the sample complexity of active learning by 60% in deep active learning tasks.
arXiv Detail & Related papers (2020-07-16T17:30:02Z) - Deep Active Learning via Open Set Recognition [0.0]
In many applications, data is easy to acquire but expensive and time-consuming to label prominent examples.
We formulate active learning as an open-set recognition problem.
Unlike current active learning methods, our algorithm can learn tasks without the need for task labels.
arXiv Detail & Related papers (2020-07-04T22:09:17Z) - Progressive Identification of True Labels for Partial-Label Learning [112.94467491335611]
Partial-label learning (PLL) is a typical weakly supervised learning problem, where each training instance is equipped with a set of candidate labels among which only one is the true label.
Most existing methods elaborately designed as constrained optimizations that must be solved in specific manners, making their computational complexity a bottleneck for scaling up to big data.
This paper proposes a novel framework of classifier with flexibility on the model and optimization algorithm.
arXiv Detail & Related papers (2020-02-19T08:35:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.