Related papers: How to Allocate your Label Budget? Choosing between Active Learning and Learning to Reject in Anomaly Detection

How to Allocate your Label Budget? Choosing between Active Learning and Learning to Reject in Anomaly Detection

URL: http://arxiv.org/abs/2301.02909v1
Date: Sat, 7 Jan 2023 18:02:43 GMT
Title: How to Allocate your Label Budget? Choosing between Active Learning and Learning to Reject in Anomaly Detection
Authors: Lorenzo Perini, Daniele Giannuzzi, Jesse Davis
Abstract summary: Anomaly detection attempts at finding examples that deviate from the expected behaviour. The lack of labels makes the anomaly detector have high uncertainty in some regions. We propose a mixed strategy that decides in multiple rounds whether to collect AL labels or Learning to Reject labels.
Score: 15.224212372777002
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Anomaly detection attempts at finding examples that deviate from the expected behaviour. Usually, anomaly detection is tackled from an unsupervised perspective because anomalous labels are rare and difficult to acquire. However, the lack of labels makes the anomaly detector have high uncertainty in some regions, which usually results in poor predictive performance or low user trust in the predictions. One can reduce such uncertainty by collecting specific labels using Active Learning (AL), which targets examples close to the detector's decision boundary. Alternatively, one can increase the user trust by allowing the detector to abstain from making highly uncertain predictions, which is called Learning to Reject (LR). One way to do this is by thresholding the detector's uncertainty based on where its performance is low, which requires labels to be evaluated. Although both AL and LR need labels, they work with different types of labels: AL seeks strategic labels, which are evidently biased, while LR requires i.i.d. labels to evaluate the detector's performance and set the rejection threshold. Because one usually has a unique label budget, deciding how to optimally allocate it is challenging. In this paper, we propose a mixed strategy that, given a budget of labels, decides in multiple rounds whether to use the budget to collect AL labels or LR labels. The strategy is based on a reward function that measures the expected gain when allocating the budget to either side. We evaluate our strategy on 18 benchmark datasets and compare it to some baselines.

Related papers

Selective Labeling with False Discovery Rate Control [18.821115689561253]
We introduce textbfConformal Labeling, a novel method to identify instances where AI predictions can be provably trusted.<n>This is achieved by controlling the false discovery rate (FDR), the proportion of incorrect labels within the selected subset.<n>In particular, we construct a conformal $p$-value for each test instance by comparing AI models' predicted confidence to those of calibration instances mislabeled by AI models.
arXiv Detail & Related papers (2025-10-16T11:39:00Z)
DQS: A Low-Budget Query Strategy for Enhancing Unsupervised Data-driven Anomaly Detection Approaches [3.3482093430607267]
This work integrates active learning with an existing unsupervised anomaly detection method.<n>We introduce a novel query strategy called the dissimilarity-based query strategy (DQS)
arXiv Detail & Related papers (2025-09-06T09:50:10Z)
Unsupervised Learning of Distributional Properties can Supplement Human Labeling and Increase Active Learning Efficiency in Anomaly Detection [0.0]
Exfiltration of data via email is a serious cybersecurity threat for many organizations. Active Learning is a promising approach for labeling data efficiently. We propose an adaptive AL sampling strategy to produce batches of cases to be labeled that contain instances of rare anomalies.
arXiv Detail & Related papers (2023-07-13T22:14:30Z)
Partial-Label Regression [54.74984751371617]
Partial-label learning is a weakly supervised learning setting that allows each training example to be annotated with a set of candidate labels. Previous studies on partial-label learning only focused on the classification setting where candidate labels are all discrete. In this paper, we provide the first attempt to investigate partial-label regression, where each training example is annotated with a set of real-valued candidate labels.
arXiv Detail & Related papers (2023-06-15T09:02:24Z)
Ambiguity-Resistant Semi-Supervised Learning for Dense Object Detection [98.66771688028426]
We propose a Ambiguity-Resistant Semi-supervised Learning (ARSL) for one-stage detectors. Joint-Confidence Estimation (JCE) is proposed to quantifies the classification and localization quality of pseudo labels. ARSL effectively mitigates the ambiguities and achieves state-of-the-art SSOD performance on MS COCO and PASCAL VOC.
arXiv Detail & Related papers (2023-03-27T07:46:58Z)
Exploiting Completeness and Uncertainty of Pseudo Labels for Weakly Supervised Video Anomaly Detection [149.23913018423022]
Weakly supervised video anomaly detection aims to identify abnormal events in videos using only video-level labels. Two-stage self-training methods have achieved significant improvements by self-generating pseudo labels. We propose an enhancement framework by exploiting completeness and uncertainty properties for effective self-training.
arXiv Detail & Related papers (2022-12-08T05:53:53Z)
Dist-PU: Positive-Unlabeled Learning from a Label Distribution Perspective [89.5370481649529]
We propose a label distribution perspective for PU learning in this paper. Motivated by this, we propose to pursue the label distribution consistency between predicted and ground-truth label distributions. Experiments on three benchmark datasets validate the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-12-06T07:38:29Z)
Automated Detection of Label Errors in Semantic Segmentation Datasets via Deep Learning and Uncertainty Quantification [5.279257531335345]
We for the first time present a method for detecting label errors in semantic segmentation datasets with pixel-wise labels. Our approach is able to detect the vast majority of label errors while controlling the number of false label error detections.
arXiv Detail & Related papers (2022-07-13T10:25:23Z)
Learning with Proper Partial Labels [87.65718705642819]
Partial-label learning is a kind of weakly-supervised learning with inexact labels. We show that this proper partial-label learning framework includes many previous partial-label learning settings. We then derive a unified unbiased estimator of the classification risk.
arXiv Detail & Related papers (2021-12-23T01:37:03Z)
Learning with Noisy Labels by Targeted Relabeling [52.0329205268734]
Crowdsourcing platforms are often used to collect datasets for training deep neural networks. We propose an approach which reserves a fraction of annotations to explicitly relabel highly probable labeling errors.
arXiv Detail & Related papers (2021-10-15T20:37:29Z)
Active WeaSuL: Improving Weak Supervision with Active Learning [2.624902795082451]
We propose Active WeaSuL: an approach that incorporates active learning into weak supervision. We make two contributions: 1) a modification of the weak supervision loss function, such that the expert-labelled data inform and improve the combination of weak labels; and 2) the maxKL divergence sampling strategy, which determines for which data points expert labelling is most beneficial.
arXiv Detail & Related papers (2021-04-30T08:58:26Z)
Active Learning for Noisy Data Streams Using Weak and Strong Labelers [3.9370369973510746]
We consider a novel weak and strong labeler problem inspired by humans natural ability for labeling. We propose an on-line active learning algorithm that consists of four steps: filtering, adding diversity, informative sample selection, and labeler selection. We derive a decision function that measures the information gain by combining the informativeness of individual samples and model confidence.
arXiv Detail & Related papers (2020-10-27T09:18:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.