Related papers: Achieving Minimax Rates in Pool-Based Batch Active Learning

Achieving Minimax Rates in Pool-Based Batch Active Learning

URL: http://arxiv.org/abs/2202.05448v1
Date: Fri, 11 Feb 2022 04:55:45 GMT
Title: Achieving Minimax Rates in Pool-Based Batch Active Learning
Authors: Claudio Gentile, Zhilei Wang, Tong Zhang
Abstract summary: We consider a batch active learning scenario where the learner adaptively issues batches of points to a labeling oracle. In this paper we propose a solution which requires a careful trade off between the informativeness of the queried points and their diversity.
Score: 26.12124106759262
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider a batch active learning scenario where the learner adaptively issues batches of points to a labeling oracle. Sampling labels in batches is highly desirable in practice due to the smaller number of interactive rounds with the labeling oracle (often human beings). However, batch active learning typically pays the price of a reduced adaptivity, leading to suboptimal results. In this paper we propose a solution which requires a careful trade off between the informativeness of the queried points and their diversity. We theoretically investigate batch active learning in the practically relevant scenario where the unlabeled pool of data is available beforehand (pool-based active learning). We analyze a novel stage-wise greedy algorithm and show that, as a function of the label complexity, the excess risk of this algorithm operating in the realizable setting for which we prove matches the known minimax rates in standard statistical learning settings. Our results also exhibit a mild dependence on the batch size. These are the first theoretical results that employ careful trade offs between informativeness and diversity to rigorously quantify the statistical performance of batch active learning in the pool-based scenario.

Related papers

Myopic Bayesian Decision Theory for Batch Active Learning with Partial Batch Label Sampling [1.3428169333745101]
We derive BDT for (Bayesian) active learning in the myopic framework.<n>We show that BAIT (active learning based on V-optimal experimental design) can be derived from BDT and approximations.
arXiv Detail & Related papers (2025-10-10T21:28:42Z)
Class Balance Matters to Active Class-Incremental Learning [61.11786214164405]
We aim to start from a pool of large-scale unlabeled data and then annotate the most informative samples for incremental learning. We propose Class-Balanced Selection (CBS) strategy to achieve both class balance and informativeness in chosen samples. Our CBS can be plugged and played into those CIL methods which are based on pretrained models with prompts tunning technique.
arXiv Detail & Related papers (2024-12-09T16:37:27Z)
Probably Approximately Precision and Recall Learning [62.912015491907994]
Precision and Recall are foundational metrics in machine learning. One-sided feedback--where only positive examples are observed during training--is inherent in many practical problems. We introduce a PAC learning framework where each hypothesis is represented by a graph, with edges indicating positive interactions.
arXiv Detail & Related papers (2024-11-20T04:21:07Z)
BatchGFN: Generative Flow Networks for Batch Active Learning [80.73649229919454]
BatchGFN is a novel approach for pool-based active learning that uses generative flow networks to sample sets of data points proportional to a batch reward. We show our approach enables principled sampling near-optimal utility batches at inference time with a single forward pass per point in the batch in toy regression problems.
arXiv Detail & Related papers (2023-06-26T20:41:36Z)
Deep Active Learning with Contrastive Learning Under Realistic Data Pool Assumptions [2.578242050187029]
Active learning aims to identify the most informative data from an unlabeled data pool that enables a model to reach the desired accuracy rapidly. Most existing active learning methods have been evaluated in an ideal setting where only samples relevant to the target task exist in an unlabeled data pool. We introduce new active learning benchmarks that include ambiguous, task-irrelevant out-of-distribution as well as in-distribution samples.
arXiv Detail & Related papers (2023-03-25T10:46:10Z)
Batch Active Learning at Scale [39.26441165274027]
Batch active learning, which adaptively issues batched queries to a labeling oracle, is a common approach for addressing this problem. In this work, we analyze an efficient active learning algorithm, which focuses on the large batch setting. We show that our sampling method, which combines notions of uncertainty and diversity, easily scales to batch sizes (100K-1M) several orders of magnitude larger than used in previous studies.
arXiv Detail & Related papers (2021-07-29T18:14:05Z)
Can Active Learning Preemptively Mitigate Fairness Issues? [66.84854430781097]
dataset bias is one of the prevailing causes of unfairness in machine learning. We study whether models trained with uncertainty-based ALs are fairer in their decisions with respect to a protected class. We also explore the interaction of algorithmic fairness methods such as gradient reversal (GRAD) and BALD.
arXiv Detail & Related papers (2021-04-14T14:20:22Z)
Semi-supervised Batch Active Learning via Bilevel Optimization [89.37476066973336]
We formulate our approach as a data summarization problem via bilevel optimization. We show that our method is highly effective in keyword detection tasks in the regime when only few labeled samples are available.
arXiv Detail & Related papers (2020-10-19T16:53:24Z)
ALdataset: a benchmark for pool-based active learning [1.9308522511657449]
Active learning (AL) is a subfield of machine learning (ML) in which a learning algorithm could achieve good accuracy with less training samples by interactively querying a user/oracle to label new data points. Pool-based AL is well-motivated in many ML tasks, where unlabeled data is abundant, but their labels are hard to obtain. We present experiment results for various active learning strategies, both recently proposed and classic highly-cited methods, and draw insights from the results.
arXiv Detail & Related papers (2020-10-16T04:37:29Z)
Active Learning under Label Shift [80.65643075952639]
We introduce a "medial distribution" to incorporate a tradeoff between importance and class-balanced sampling. We prove sample complexity and generalization guarantees for Mediated Active Learning under Label Shift (MALLS) We empirically demonstrate MALLS scales to high-dimensional datasets and can reduce the sample complexity of active learning by 60% in deep active learning tasks.
arXiv Detail & Related papers (2020-07-16T17:30:02Z)
Deep Active Learning via Open Set Recognition [0.0]
In many applications, data is easy to acquire but expensive and time-consuming to label prominent examples. We formulate active learning as an open-set recognition problem. Unlike current active learning methods, our algorithm can learn tasks without the need for task labels.
arXiv Detail & Related papers (2020-07-04T22:09:17Z)
Progressive Identification of True Labels for Partial-Label Learning [112.94467491335611]
Partial-label learning (PLL) is a typical weakly supervised learning problem, where each training instance is equipped with a set of candidate labels among which only one is the true label. Most existing methods elaborately designed as constrained optimizations that must be solved in specific manners, making their computational complexity a bottleneck for scaling up to big data. This paper proposes a novel framework of classifier with flexibility on the model and optimization algorithm.
arXiv Detail & Related papers (2020-02-19T08:35:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.