Toward Optimal Probabilistic Active Learning Using a Bayesian Approach
- URL: http://arxiv.org/abs/2006.01732v1
- Date: Tue, 2 Jun 2020 15:59:42 GMT
- Title: Toward Optimal Probabilistic Active Learning Using a Bayesian Approach
- Authors: Daniel Kottke, Marek Herde, Christoph Sandrock, Denis Huseljic, Georg
Krempl, Bernhard Sick
- Abstract summary: Active learning aims at reducing the labeling costs by an efficient and effective allocation of costly labeling resources.
By reformulating existing selection strategies within our proposed model, we can explain which aspects are not covered in current state-of-the-art.
- Score: 4.380488084997317
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Gathering labeled data to train well-performing machine learning models is
one of the critical challenges in many applications. Active learning aims at
reducing the labeling costs by an efficient and effective allocation of costly
labeling resources. In this article, we propose a decision-theoretic selection
strategy that (1) directly optimizes the gain in misclassification error, and
(2) uses a Bayesian approach by introducing a conjugate prior distribution to
determine the class posterior to deal with uncertainties. By reformulating
existing selection strategies within our proposed model, we can explain which
aspects are not covered in current state-of-the-art and why this leads to the
superior performance of our approach. Extensive experiments on a large variety
of datasets and different kernels validate our claims.
Related papers
- Improve Cost Efficiency of Active Learning over Noisy Dataset [1.3846014191157405]
In this paper, we consider cases of binary classification, where acquiring a positive instance incurs a significantly higher cost compared to that of negative instances.
We propose a shifted normal distribution sampling function that samples from a wider range than typical uncertainty sampling.
Our simulation underscores that our proposed sampling function limits both noisy and positive label selection, delivering between 20% and 32% improved cost efficiency over different test datasets.
arXiv Detail & Related papers (2024-03-02T23:53:24Z) - Compute-Efficient Active Learning [0.0]
Active learning aims at reducing labeling costs by selecting the most informative samples from an unlabeled dataset.
Traditional active learning process often demands extensive computational resources, hindering scalability and efficiency.
We present a novel method designed to alleviate the computational burden associated with active learning on massive datasets.
arXiv Detail & Related papers (2024-01-15T12:32:07Z) - BAL: Balancing Diversity and Novelty for Active Learning [53.289700543331925]
We introduce a novel framework, Balancing Active Learning (BAL), which constructs adaptive sub-pools to balance diverse and uncertain data.
Our approach outperforms all established active learning methods on widely recognized benchmarks by 1.20%.
arXiv Detail & Related papers (2023-12-26T08:14:46Z) - Learning to Rank for Active Learning via Multi-Task Bilevel Optimization [29.207101107965563]
We propose a novel approach for active learning, which aims to select batches of unlabeled instances through a learned surrogate model for data acquisition.
A key challenge in this approach is developing an acquisition function that generalizes well, as the history of data, which forms part of the utility function's input, grows over time.
arXiv Detail & Related papers (2023-10-25T22:50:09Z) - Overcoming Overconfidence for Active Learning [1.2776312584227847]
We present two novel methods to address the problem of overconfidence that arises in the active learning scenario.
The first is an augmentation strategy named Cross-Mix-and-Mix (CMaM), which aims to calibrate the model by expanding the limited training distribution.
The second is a selection strategy named Ranked Margin Sampling (RankedMS), which prevents choosing data that leads to overly confident predictions.
arXiv Detail & Related papers (2023-08-21T09:04:54Z) - In Search of Insights, Not Magic Bullets: Towards Demystification of the
Model Selection Dilemma in Heterogeneous Treatment Effect Estimation [92.51773744318119]
This paper empirically investigates the strengths and weaknesses of different model selection criteria.
We highlight that there is a complex interplay between selection strategies, candidate estimators and the data used for comparing them.
arXiv Detail & Related papers (2023-02-06T16:55:37Z) - MCDAL: Maximum Classifier Discrepancy for Active Learning [74.73133545019877]
Recent state-of-the-art active learning methods have mostly leveraged Generative Adversarial Networks (GAN) for sample acquisition.
We propose in this paper a novel active learning framework that we call Maximum Discrepancy for Active Learning (MCDAL)
In particular, we utilize two auxiliary classification layers that learn tighter decision boundaries by maximizing the discrepancies among them.
arXiv Detail & Related papers (2021-07-23T06:57:08Z) - Just Label What You Need: Fine-Grained Active Selection for Perception
and Prediction through Partially Labeled Scenes [78.23907801786827]
We introduce generalizations that ensure that our approach is both cost-aware and allows for fine-grained selection of examples through partially labeled scenes.
Our experiments on a real-world, large-scale self-driving dataset suggest that fine-grained selection can improve the performance across perception, prediction, and downstream planning tasks.
arXiv Detail & Related papers (2021-04-08T17:57:41Z) - Cost-Based Budget Active Learning for Deep Learning [0.9732863739456035]
We propose a Cost-Based Bugdet Active Learning (CBAL) which considers the classification uncertainty as well as instance diversity in a population constrained by a budget.
A principled approach based on the min-max is considered to minimize both the labeling and decision cost of the selected instances.
arXiv Detail & Related papers (2020-12-09T17:42:44Z) - Semi-supervised Batch Active Learning via Bilevel Optimization [89.37476066973336]
We formulate our approach as a data summarization problem via bilevel optimization.
We show that our method is highly effective in keyword detection tasks in the regime when only few labeled samples are available.
arXiv Detail & Related papers (2020-10-19T16:53:24Z) - Learning the Truth From Only One Side of the Story [58.65439277460011]
We focus on generalized linear models and show that without adjusting for this sampling bias, the model may converge suboptimally or even fail to converge to the optimal solution.
We propose an adaptive approach that comes with theoretical guarantees and show that it outperforms several existing methods empirically.
arXiv Detail & Related papers (2020-06-08T18:20:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.