Stream-based Active Learning with Verification Latency in Non-stationary
Environments
- URL: http://arxiv.org/abs/2204.06822v1
- Date: Thu, 14 Apr 2022 08:51:15 GMT
- Title: Stream-based Active Learning with Verification Latency in Non-stationary
Environments
- Authors: Andrea Castellani, Sebastian Schmitt, Barbara Hammer
- Abstract summary: We investigate the influence of finite, time-variable, and unknown verification delay, in the presence of concept drift on AL approaches.
We propose PRopagate, a latency independent utility estimator which predicts the requested, but not yet known, labels.
We empirically show that the proposed method consistently outperforms the state-of-the-art.
- Score: 6.883906273999368
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Data stream classification is an important problem in the field of machine
learning. Due to the non-stationary nature of the data where the underlying
distribution changes over time (concept drift), the model needs to continuously
adapt to new data statistics. Stream-based Active Learning (AL) approaches
address this problem by interactively querying a human expert to provide new
data labels for the most recent samples, within a limited budget. Existing AL
strategies assume that labels are immediately available, while in a real-world
scenario the expert requires time to provide a queried label (verification
latency), and by the time the requested labels arrive they may not be relevant
anymore. In this article, we investigate the influence of finite,
time-variable, and unknown verification delay, in the presence of concept drift
on AL approaches. We propose PRopagate (PR), a latency independent utility
estimator which also predicts the requested, but not yet known, labels.
Furthermore, we propose a drift-dependent dynamic budget strategy, which uses a
variable distribution of the labelling budget over time, after a detected
drift. Thorough experimental evaluation, with both synthetic and real-world
non-stationary datasets, and different settings of verification latency and
budget are conducted and analyzed. We empirically show that the proposed method
consistently outperforms the state-of-the-art. Additionally, we demonstrate
that with variable budget allocation in time, it is possible to boost the
performance of AL strategies, without increasing the overall labeling budget.
Related papers
- Label Delay in Online Continual Learning [77.05325581370893]
A critical aspect often overlooked is the label delay, where new data may not be labeled due to slow and costly annotation processes.
We introduce a new continual learning framework with explicit modeling of the label delay between data and label streams over time steps.
We show experimentally that our method is the least affected by the label delay factor and in some cases successfully recovers the accuracy of the non-delayed counterpart.
arXiv Detail & Related papers (2023-12-01T20:52:10Z) - An Adaptive Method for Weak Supervision with Drifting Data [11.035811912078216]
We introduce an adaptive method with formal quality guarantees for weak supervision in a non-stationary setting.
We focus on the non-stationary case, where the accuracy of the weak supervision sources can drift over time.
Our algorithm does not require any assumptions on the magnitude of the drift, and it adapts based on the input.
arXiv Detail & Related papers (2023-06-02T16:27:34Z) - PULL: Reactive Log Anomaly Detection Based On Iterative PU Learning [58.85063149619348]
We propose PULL, an iterative log analysis method for reactive anomaly detection based on estimated failure time windows.
Our evaluation shows that PULL consistently outperforms ten benchmark baselines across three different datasets.
arXiv Detail & Related papers (2023-01-25T16:34:43Z) - Stream-based active learning with linear models [0.7734726150561089]
In production, instead of performing random inspections to obtain product information, labels are collected by evaluating the information content of the unlabeled data.
We propose a new strategy for the stream-based scenario, where instances are sequentially offered to the learner.
The iterative aspect of the decision-making process is tackled by setting a threshold on the informativeness of the unlabeled data points.
arXiv Detail & Related papers (2022-07-20T13:15:23Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Mining Drifting Data Streams on a Budget: Combining Active Learning with
Self-Labeling [6.436899373275926]
We propose a novel framework for mining drifting data streams on a budget, by combining information coming from active learning and self-labeling.
We introduce several strategies that can take advantage of both intelligent instance selection and semi-supervised procedures, while taking into account the potential presence of concept drift.
arXiv Detail & Related papers (2021-12-21T07:19:35Z) - Knowledge-driven Active Learning [70.37119719069499]
Active learning strategies aim at minimizing the amount of labelled data required to train a Deep Learning model.
Most active strategies are based on uncertain sample selection, and even often restricted to samples lying close to the decision boundary.
Here we propose to take into consideration common domain-knowledge and enable non-expert users to train a model with fewer samples.
arXiv Detail & Related papers (2021-10-15T06:11:53Z) - Unsupervised and self-adaptative techniques for cross-domain person
re-identification [82.54691433502335]
Person Re-Identification (ReID) across non-overlapping cameras is a challenging task.
Unsupervised Domain Adaptation (UDA) is a promising alternative, as it performs feature-learning adaptation from a model trained on a source to a target domain without identity-label annotation.
In this paper, we propose a novel UDA-based ReID method that takes advantage of triplets of samples created by a new offline strategy.
arXiv Detail & Related papers (2021-03-21T23:58:39Z) - Task-Aware Variational Adversarial Active Learning [42.334671410592065]
We propose task-aware variational adversarial AL (TA-VAAL) that modifies task-agnostic VAAL.
Our proposed TA-VAAL outperforms state-of-the-arts on various benchmark datasets for classifications with balanced / imbalanced labels.
arXiv Detail & Related papers (2020-02-11T22:00:48Z) - Low-Budget Label Query through Domain Alignment Enforcement [48.06803561387064]
We tackle a new problem named low-budget label query.
We first improve an Unsupervised Domain Adaptation (UDA) method to better align source and target domains.
We then propose a simple yet effective selection method based on uniform sampling of the prediction consistency distribution.
arXiv Detail & Related papers (2020-01-01T16:52:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.