ConBatch-BAL: Batch Bayesian Active Learning under Budget Constraints
- URL: http://arxiv.org/abs/2507.04929v1
- Date: Mon, 07 Jul 2025 12:25:12 GMT
- Title: ConBatch-BAL: Batch Bayesian Active Learning under Budget Constraints
- Authors: Pablo G. Morato, Charalampos P. Andriotis, Seyran Khademi,
- Abstract summary: This work introduces two Bayesian active learning strategies for batch acquisition under constraints (ConBatch-BAL)<n>The strategies are benchmarked against a random acquisition baseline on two real-world datasets.<n>The results show that the developed ConBatch-BAL strategies can reduce active learning iterations and data acquisition costs in real-world settings.
- Score: 1.4678959818041628
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Varying annotation costs among data points and budget constraints can hinder the adoption of active learning strategies in real-world applications. This work introduces two Bayesian active learning strategies for batch acquisition under constraints (ConBatch-BAL), one based on dynamic thresholding and one following greedy acquisition. Both select samples using uncertainty metrics computed via Bayesian neural networks. The dynamic thresholding strategy redistributes the budget across the batch, while the greedy one selects the top-ranked sample at each step, limited by the remaining budget. Focusing on scenarios with costly data annotation and geospatial constraints, we also release two new real-world datasets containing geolocated aerial images of buildings, annotated with energy efficiency or typology classes. The ConBatch-BAL strategies are benchmarked against a random acquisition baseline on these datasets under various budget and cost scenarios. The results show that the developed ConBatch-BAL strategies can reduce active learning iterations and data acquisition costs in real-world settings, and even outperform the unconstrained baseline solutions.
Related papers
- Cohort-Based Active Modality Acquisition [0.27271717351505986]
We introduce cohort-based Active Modality Acquisition (CAMA) to formalize the challenge of selecting which samples should receive additional modalities.<n>We derive acquisition strategies that leverage a combination of generative imputation and discriminative modeling.<n>Experiments on common multimodal datasets demonstrate that our proposed imputation-based strategies can more effectively guide the acquisition of new samples.
arXiv Detail & Related papers (2025-05-22T15:32:50Z) - Cost-Aware Query Policies in Active Learning for Efficient Autonomous Robotic Exploration [0.0]
This paper analyzes an AL algorithm for Gaussian Process regression while incorporating action cost.
Traditional uncertainty metric with a distance constraint best minimizes root-mean-square error over trajectory distance.
arXiv Detail & Related papers (2024-10-31T18:35:03Z) - Provably Efficient Exploration in Inverse Constrained Reinforcement Learning [12.178081346315523]
Inverse Constrained Reinforcement Learning is a common solver for recovering feasible constraints in complex environments.<n>We propose a strategic exploration framework for sampling with guaranteed efficiency to bridge this gap.<n>We introduce two exploratory algorithms to achieve efficient constraint inference.
arXiv Detail & Related papers (2024-09-24T10:48:13Z) - Minimax and Communication-Efficient Distributed Best Subset Selection with Oracle Property [0.358439716487063]
The explosion of large-scale data has outstripped the processing capabilities of single-machine systems.
Traditional approaches to distributed inference often struggle with achieving true sparsity in high-dimensional datasets.
We propose a novel, two-stage, distributed best subset selection algorithm to address these issues.
arXiv Detail & Related papers (2024-08-30T13:22:08Z) - Decentralized Learning Strategies for Estimation Error Minimization with Graph Neural Networks [94.2860766709971]
We address the challenge of sampling and remote estimation for autoregressive Markovian processes in a wireless network with statistically-identical agents.<n>Our goal is to minimize time-average estimation error and/or age of information with decentralized scalable sampling and transmission policies.
arXiv Detail & Related papers (2024-04-04T06:24:11Z) - Scalable Online Exploration via Coverability [45.66375686120087]
Exploration is a major challenge in reinforcement learning, especially for high-dimensional domains that require function approximation.
We introduce a new objective, $L_Coverage, which generalizes previous exploration schemes and supports three fundamental desideratas.
$L_Coverage enables the first computationally efficient model-based and model-free algorithms for online (reward-free or reward-driven) reinforcement learning in MDPs with low coverability.
arXiv Detail & Related papers (2024-03-11T10:14:06Z) - BAL: Balancing Diversity and Novelty for Active Learning [53.289700543331925]
We introduce a novel framework, Balancing Active Learning (BAL), which constructs adaptive sub-pools to balance diverse and uncertain data.
Our approach outperforms all established active learning methods on widely recognized benchmarks by 1.20%.
arXiv Detail & Related papers (2023-12-26T08:14:46Z) - Large-scale Fully-Unsupervised Re-Identification [78.47108158030213]
We propose two strategies to learn from large-scale unlabeled data.
The first strategy performs a local neighborhood sampling to reduce the dataset size in each without violating neighborhood relationships.
A second strategy leverages a novel Re-Ranking technique, which has a lower time upper bound complexity and reduces the memory complexity from O(n2) to O(kn) with k n.
arXiv Detail & Related papers (2023-07-26T16:19:19Z) - Exploring Active 3D Object Detection from a Generalization Perspective [58.597942380989245]
Uncertainty-based active learning policies fail to balance the trade-off between point cloud informativeness and box-level annotation costs.
We propose textscCrb, which hierarchically filters out the point clouds of redundant 3D bounding box labels.
Experiments show that the proposed approach outperforms existing active learning strategies.
arXiv Detail & Related papers (2023-01-23T02:43:03Z) - Quantization for decentralized learning under subspace constraints [61.59416703323886]
We consider decentralized optimization problems where agents have individual cost functions to minimize subject to subspace constraints.
We propose and study an adaptive decentralized strategy where the agents employ differential randomized quantizers to compress their estimates.
The analysis shows that, under some general conditions on the quantization noise, the strategy is stable both in terms of mean-square error and average bit rate.
arXiv Detail & Related papers (2022-09-16T09:38:38Z) - Mining Drifting Data Streams on a Budget: Combining Active Learning with
Self-Labeling [6.436899373275926]
We propose a novel framework for mining drifting data streams on a budget, by combining information coming from active learning and self-labeling.
We introduce several strategies that can take advantage of both intelligent instance selection and semi-supervised procedures, while taking into account the potential presence of concept drift.
arXiv Detail & Related papers (2021-12-21T07:19:35Z) - Semi-supervised Batch Active Learning via Bilevel Optimization [89.37476066973336]
We formulate our approach as a data summarization problem via bilevel optimization.
We show that our method is highly effective in keyword detection tasks in the regime when only few labeled samples are available.
arXiv Detail & Related papers (2020-10-19T16:53:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.