Cost-Based Budget Active Learning for Deep Learning
- URL:
- Date: Wed, 9 Dec 2020 17:42:44 GMT
- Title: Cost-Based Budget Active Learning for Deep Learning
- Authors: Patrick K. Gikunda, Nicolas Jouandeau
- Abstract summary: We propose a Cost-Based Bugdet Active Learning (CBAL) which considers the classification uncertainty as well as instance diversity in a population constrained by a budget.
A principled approach based on the min-max is considered to minimize both the labeling and decision cost of the selected instances.
- Score: 0.9732863739456035
- License:
- Abstract: Majorly classical Active Learning (AL) approach usually uses statistical
theory such as entropy and margin to measure instance utility, however it fails
to capture the data distribution information contained in the unlabeled data.
This can eventually cause the classifier to select outlier instances to label.
Meanwhile, the loss associated with mislabeling an instance in a typical
classification task is much higher than the loss associated with the opposite
error. To address these challenges, we propose a Cost-Based Bugdet Active
Learning (CBAL) which considers the classification uncertainty as well as
instance diversity in a population constrained by a budget. A principled
approach based on the min-max is considered to minimize both the labeling and
decision cost of the selected instances, this ensures a near-optimal results
with significantly less computational effort. Extensive experimental results
show that the proposed approach outperforms several state-of -the-art active
learning approaches.
Related papers
- Improve Cost Efficiency of Active Learning over Noisy Dataset [1.3846014191157405]
In this paper, we consider cases of binary classification, where acquiring a positive instance incurs a significantly higher cost compared to that of negative instances.
We propose a shifted normal distribution sampling function that samples from a wider range than typical uncertainty sampling.
Our simulation underscores that our proposed sampling function limits both noisy and positive label selection, delivering between 20% and 32% improved cost efficiency over different test datasets.
arXiv Detail & Related papers (2024-03-02T23:53:24Z) - Querying Easily Flip-flopped Samples for Deep Active Learning [63.62397322172216]
Active learning is a machine learning paradigm that aims to improve the performance of a model by strategically selecting and querying unlabeled data.
One effective selection strategy is to base it on the model's predictive uncertainty, which can be interpreted as a measure of how informative a sample is.
This paper proposes the it least disagree metric (LDM) as the smallest probability of disagreement of the predicted label.
arXiv Detail & Related papers (2024-01-18T08:12:23Z) - A Unified Approach to Count-Based Weakly-Supervised Learning [30.953260850416157]
We develop a unified approach to learning from weakly-labeled data.
We compute the probability of exactly k out of n outputs being set to true.
We evaluate our approach on three common weakly-supervised learning paradigms.
arXiv Detail & Related papers (2023-11-22T22:23:34Z) - Class-Distribution-Aware Pseudo Labeling for Semi-Supervised Multi-Label
Learning [97.88458953075205]
Pseudo-labeling has emerged as a popular and effective approach for utilizing unlabeled data.
This paper proposes a novel solution called Class-Aware Pseudo-Labeling (CAP) that performs pseudo-labeling in a class-aware manner.
arXiv Detail & Related papers (2023-05-04T12:52:18Z) - Complementary Labels Learning with Augmented Classes [22.460256396941528]
Complementary Labels Learning (CLL) arises in many real-world tasks such as private questions classification and online learning.
We propose a novel problem setting called Complementary Labels Learning with Augmented Classes (CLLAC)
By using unlabeled data, we propose an unbiased estimator of classification risk for CLLAC, which is guaranteed to be provably consistent.
arXiv Detail & Related papers (2022-11-19T13:55:27Z) - MaxMatch: Semi-Supervised Learning with Worst-Case Consistency [149.03760479533855]
We propose a worst-case consistency regularization technique for semi-supervised learning (SSL)
We present a generalization bound for SSL consisting of the empirical loss terms observed on labeled and unlabeled training data separately.
Motivated by this bound, we derive an SSL objective that minimizes the largest inconsistency between an original unlabeled sample and its multiple augmented variants.
arXiv Detail & Related papers (2022-09-26T12:04:49Z) - Debiased Pseudo Labeling in Self-Training [77.83549261035277]
Deep neural networks achieve remarkable performances on a wide range of tasks with the aid of large-scale labeled datasets.
To mitigate the requirement for labeled data, self-training is widely used in both academia and industry by pseudo labeling on readily-available unlabeled data.
We propose Debiased, in which the generation and utilization of pseudo labels are decoupled by two independent heads.
arXiv Detail & Related papers (2022-02-15T02:14:33Z) - Minimax Active Learning [61.729667575374606]
Active learning aims to develop label-efficient algorithms by querying the most representative samples to be labeled by a human annotator.
Current active learning techniques either rely on model uncertainty to select the most uncertain samples or use clustering or reconstruction to choose the most diverse set of unlabeled examples.
We develop a semi-supervised minimax entropy-based active learning algorithm that leverages both uncertainty and diversity in an adversarial manner.
arXiv Detail & Related papers (2020-12-18T19:03:40Z) - Toward Optimal Probabilistic Active Learning Using a Bayesian Approach [4.380488084997317]
Active learning aims at reducing the labeling costs by an efficient and effective allocation of costly labeling resources.
By reformulating existing selection strategies within our proposed model, we can explain which aspects are not covered in current state-of-the-art.
arXiv Detail & Related papers (2020-06-02T15:59:42Z) - Progressive Identification of True Labels for Partial-Label Learning [112.94467491335611]
Partial-label learning (PLL) is a typical weakly supervised learning problem, where each training instance is equipped with a set of candidate labels among which only one is the true label.
Most existing methods elaborately designed as constrained optimizations that must be solved in specific manners, making their computational complexity a bottleneck for scaling up to big data.
This paper proposes a novel framework of classifier with flexibility on the model and optimization algorithm.
arXiv Detail & Related papers (2020-02-19T08:35:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.