Scalable Batch Acquisition for Deep Bayesian Active Learning
- URL: http://arxiv.org/abs/2301.05490v1
- Date: Fri, 13 Jan 2023 11:45:17 GMT
- Title: Scalable Batch Acquisition for Deep Bayesian Active Learning
- Authors: Aleksandr Rubashevskii, Daria Kotova and Maxim Panov
- Abstract summary: In deep active learning, it is important to choose multiple examples to markup at each step.
Existing solutions to this problem, such as BatchBALD, have significant limitations in selecting a large number of examples.
We present the Large BatchBALD algorithm, which aims to achieve comparable quality while being more computationally efficient.
- Score: 70.68403899432198
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In deep active learning, it is especially important to choose multiple
examples to markup at each step to work efficiently, especially on large
datasets. At the same time, existing solutions to this problem in the Bayesian
setup, such as BatchBALD, have significant limitations in selecting a large
number of examples, associated with the exponential complexity of computing
mutual information for joint random variables. We, therefore, present the Large
BatchBALD algorithm, which gives a well-grounded approximation to the BatchBALD
method that aims to achieve comparable quality while being more computationally
efficient. We provide a complexity analysis of the algorithm, showing a
reduction in computation time, especially for large batches. Furthermore, we
present an extensive set of experimental results on image and text data, both
on toy datasets and larger ones such as CIFAR-100.
Related papers
- Scalable Private Partition Selection via Adaptive Weighting [66.09199304818928]
In a private set union, users hold subsets of items from an unbounded universe.
The goal is to output as many items as possible from the union of the users' sets while maintaining user-level differential privacy.
We propose an algorithm for this problem, MaximumDegree (MAD), which adaptively reroutes weight from items with weight far above the threshold needed for privacy to items with smaller weight.
arXiv Detail & Related papers (2025-02-13T01:27:11Z) - Retrieval with Learned Similarities [2.729516456192901]
State-of-the-art retrieval algorithms have migrated to learned similarities.
We show that Mixture-of-Logits (MoL) can be realized empirically to achieve superior performance on diverse retrieval scenarios.
arXiv Detail & Related papers (2024-07-22T08:19:34Z) - A Weighted K-Center Algorithm for Data Subset Selection [70.49696246526199]
Subset selection is a fundamental problem that can play a key role in identifying smaller portions of the training data.
We develop a novel factor 3-approximation algorithm to compute subsets based on the weighted sum of both k-center and uncertainty sampling objective functions.
arXiv Detail & Related papers (2023-12-17T04:41:07Z) - BatchGFN: Generative Flow Networks for Batch Active Learning [80.73649229919454]
BatchGFN is a novel approach for pool-based active learning that uses generative flow networks to sample sets of data points proportional to a batch reward.
We show our approach enables principled sampling near-optimal utility batches at inference time with a single forward pass per point in the batch in toy regression problems.
arXiv Detail & Related papers (2023-06-26T20:41:36Z) - A Fast Algorithm for the Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit [55.2480439325792]
We study the real-valued pure exploration problem in the multi-armed bandit (R-CPE-MAB)
We introduce an algorithm named the gap-based exploration (CombGapE) algorithm, whose sample complexity upper bound matches the lower bound up to a problem-dependent constant factor.
We numerically show that the CombGapE algorithm outperforms existing methods significantly in both synthetic and real-world datasets.
arXiv Detail & Related papers (2023-06-15T15:37:31Z) - Speeding Up BatchBALD: A k-BALD Family of Approximations for Active
Learning [1.52292571922932]
BatchBALD is a technique for training machine learning models with limited labeled data.
In this paper, we propose a new approximation, k-BALD, which uses k-wise mutual information terms to approximate BatchBALD.
Results on the MNIST dataset show that k-BALD is significantly faster than BatchBALD while maintaining similar performance.
arXiv Detail & Related papers (2023-01-23T15:38:58Z) - Batch Active Learning at Scale [39.26441165274027]
Batch active learning, which adaptively issues batched queries to a labeling oracle, is a common approach for addressing this problem.
In this work, we analyze an efficient active learning algorithm, which focuses on the large batch setting.
We show that our sampling method, which combines notions of uncertainty and diversity, easily scales to batch sizes (100K-1M) several orders of magnitude larger than used in previous studies.
arXiv Detail & Related papers (2021-07-29T18:14:05Z) - Sample-Efficient Reinforcement Learning of Undercomplete POMDPs [91.40308354344505]
This work shows that these hardness barriers do not preclude efficient reinforcement learning for rich and interesting subclasses of Partially Observable Decision Processes (POMDPs)
We present a sample-efficient algorithm, OOM-UCB, for episodic finite undercomplete POMDPs, where the number of observations is larger than the number of latent states and where exploration is essential for learning, thus distinguishing our results from prior works.
arXiv Detail & Related papers (2020-06-22T17:58:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.