Scalable Batch Acquisition for Deep Bayesian Active Learning
- URL: http://arxiv.org/abs/2301.05490v1
- Date: Fri, 13 Jan 2023 11:45:17 GMT
- Title: Scalable Batch Acquisition for Deep Bayesian Active Learning
- Authors: Aleksandr Rubashevskii, Daria Kotova and Maxim Panov
- Abstract summary: In deep active learning, it is important to choose multiple examples to markup at each step.
Existing solutions to this problem, such as BatchBALD, have significant limitations in selecting a large number of examples.
We present the Large BatchBALD algorithm, which aims to achieve comparable quality while being more computationally efficient.
- Score: 70.68403899432198
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In deep active learning, it is especially important to choose multiple
examples to markup at each step to work efficiently, especially on large
datasets. At the same time, existing solutions to this problem in the Bayesian
setup, such as BatchBALD, have significant limitations in selecting a large
number of examples, associated with the exponential complexity of computing
mutual information for joint random variables. We, therefore, present the Large
BatchBALD algorithm, which gives a well-grounded approximation to the BatchBALD
method that aims to achieve comparable quality while being more computationally
efficient. We provide a complexity analysis of the algorithm, showing a
reduction in computation time, especially for large batches. Furthermore, we
present an extensive set of experimental results on image and text data, both
on toy datasets and larger ones such as CIFAR-100.
Related papers
- Fast Fishing: Approximating BAIT for Efficient and Scalable Deep Active Image Classification [1.8567173419246403]
Deep active learning (AL) seeks to minimize the annotation costs for training deep neural networks.
BAIT, a recently proposed AL strategy based on the Fisher Information, has demonstrated impressive performance across various datasets.
This paper introduces two methods to enhance BAIT's computational efficiency and scalability.
arXiv Detail & Related papers (2024-04-13T12:09:37Z) - A Weighted K-Center Algorithm for Data Subset Selection [70.49696246526199]
Subset selection is a fundamental problem that can play a key role in identifying smaller portions of the training data.
We develop a novel factor 3-approximation algorithm to compute subsets based on the weighted sum of both k-center and uncertainty sampling objective functions.
arXiv Detail & Related papers (2023-12-17T04:41:07Z) - BatchGFN: Generative Flow Networks for Batch Active Learning [80.73649229919454]
BatchGFN is a novel approach for pool-based active learning that uses generative flow networks to sample sets of data points proportional to a batch reward.
We show our approach enables principled sampling near-optimal utility batches at inference time with a single forward pass per point in the batch in toy regression problems.
arXiv Detail & Related papers (2023-06-26T20:41:36Z) - Speeding Up BatchBALD: A k-BALD Family of Approximations for Active
Learning [1.52292571922932]
BatchBALD is a technique for training machine learning models with limited labeled data.
In this paper, we propose a new approximation, k-BALD, which uses k-wise mutual information terms to approximate BatchBALD.
Results on the MNIST dataset show that k-BALD is significantly faster than BatchBALD while maintaining similar performance.
arXiv Detail & Related papers (2023-01-23T15:38:58Z) - Multidimensional Assignment Problem for multipartite entity resolution [69.48568967931608]
Multipartite entity resolution aims at integrating records from multiple datasets into one entity.
We apply two procedures, a Greedy algorithm and a large scale neighborhood search, to solve the assignment problem.
We find evidence that design-based multi-start can be more efficient as the size of databases grow large.
arXiv Detail & Related papers (2021-12-06T20:34:55Z) - Batch Active Learning at Scale [39.26441165274027]
Batch active learning, which adaptively issues batched queries to a labeling oracle, is a common approach for addressing this problem.
In this work, we analyze an efficient active learning algorithm, which focuses on the large batch setting.
We show that our sampling method, which combines notions of uncertainty and diversity, easily scales to batch sizes (100K-1M) several orders of magnitude larger than used in previous studies.
arXiv Detail & Related papers (2021-07-29T18:14:05Z) - Multi-task Supervised Learning via Cross-learning [102.64082402388192]
We consider a problem known as multi-task learning, consisting of fitting a set of regression functions intended for solving different tasks.
In our novel formulation, we couple the parameters of these functions, so that they learn in their task specific domains while staying close to each other.
This facilitates cross-fertilization in which data collected across different domains help improving the learning performance at each other task.
arXiv Detail & Related papers (2020-10-24T21:35:57Z) - Sample-Efficient Reinforcement Learning of Undercomplete POMDPs [91.40308354344505]
This work shows that these hardness barriers do not preclude efficient reinforcement learning for rich and interesting subclasses of Partially Observable Decision Processes (POMDPs)
We present a sample-efficient algorithm, OOM-UCB, for episodic finite undercomplete POMDPs, where the number of observations is larger than the number of latent states and where exploration is essential for learning, thus distinguishing our results from prior works.
arXiv Detail & Related papers (2020-06-22T17:58:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.