Stream-based active learning with linear models
- URL: http://arxiv.org/abs/2207.09874v5
- Date: Thu, 13 Jul 2023 15:13:16 GMT
- Title: Stream-based active learning with linear models
- Authors: Davide Cacciarelli, Murat Kulahci, John S{\o}lve Tyssedal
- Abstract summary: In production, instead of performing random inspections to obtain product information, labels are collected by evaluating the information content of the unlabeled data.
We propose a new strategy for the stream-based scenario, where instances are sequentially offered to the learner.
The iterative aspect of the decision-making process is tackled by setting a threshold on the informativeness of the unlabeled data points.
- Score: 0.7734726150561089
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The proliferation of automated data collection schemes and the advances in
sensorics are increasing the amount of data we are able to monitor in
real-time. However, given the high annotation costs and the time required by
quality inspections, data is often available in an unlabeled form. This is
fostering the use of active learning for the development of soft sensors and
predictive models. In production, instead of performing random inspections to
obtain product information, labels are collected by evaluating the information
content of the unlabeled data. Several query strategy frameworks for regression
have been proposed in the literature but most of the focus has been dedicated
to the static pool-based scenario. In this work, we propose a new strategy for
the stream-based scenario, where instances are sequentially offered to the
learner, which must instantaneously decide whether to perform the quality check
to obtain the label or discard the instance. The approach is inspired by the
optimal experimental design theory and the iterative aspect of the
decision-making process is tackled by setting a threshold on the
informativeness of the unlabeled data points. The proposed approach is
evaluated using numerical simulations and the Tennessee Eastman Process
simulator. The results confirm that selecting the examples suggested by the
proposed algorithm allows for a faster reduction in the prediction error.
Related papers
- Downstream-Pretext Domain Knowledge Traceback for Active Learning [138.02530777915362]
We propose a downstream-pretext domain knowledge traceback (DOKT) method that traces the data interactions of downstream knowledge and pre-training guidance.
DOKT consists of a traceback diversity indicator and a domain-based uncertainty estimator.
Experiments conducted on ten datasets show that our model outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-07-20T01:34:13Z) - Stochastic Amortization: A Unified Approach to Accelerate Feature and Data Attribution [62.71425232332837]
We show that training amortized models with noisy labels is inexpensive and surprisingly effective.
This approach significantly accelerates several feature attribution and data valuation methods, often yielding an order of magnitude speedup over existing approaches.
arXiv Detail & Related papers (2024-01-29T03:42:37Z) - Towards Free Data Selection with General-Purpose Models [71.92151210413374]
A desirable data selection algorithm can efficiently choose the most informative samples to maximize the utility of limited annotation budgets.
Current approaches, represented by active learning methods, typically follow a cumbersome pipeline that iterates the time-consuming model training and batch data selection repeatedly.
FreeSel bypasses the heavy batch selection process, achieving a significant improvement in efficiency and being 530x faster than existing active learning methods.
arXiv Detail & Related papers (2023-09-29T15:50:14Z) - Rethinking Precision of Pseudo Label: Test-Time Adaptation via
Complementary Learning [10.396596055773012]
We propose a novel complementary learning approach to enhance test-time adaptation.
In test-time adaptation tasks, information from the source domain is typically unavailable.
We highlight that the risk function of complementary labels agrees with their Vanilla loss formula.
arXiv Detail & Related papers (2023-01-15T03:36:33Z) - Temporal Output Discrepancy for Loss Estimation-based Active Learning [65.93767110342502]
We present a novel deep active learning approach that queries the oracle for data annotation when the unlabeled sample is believed to incorporate high loss.
Our approach achieves superior performances than the state-of-the-art active learning methods on image classification and semantic segmentation tasks.
arXiv Detail & Related papers (2022-12-20T19:29:37Z) - Mitigating sampling bias in risk-based active learning via an EM
algorithm [0.0]
Risk-based active learning is an approach to developing statistical classifiers for online decision-support.
Data-label querying is guided according to the expected value of perfect information for incipient data points.
Semi-supervised approach counteracts sampling bias by incorporating pseudo-labels for unlabelled data via an EM algorithm.
arXiv Detail & Related papers (2022-06-25T08:48:25Z) - Graph-based Reinforcement Learning for Active Learning in Real Time: An
Application in Modeling River Networks [2.8631830115500394]
We develop a real-time active learning method that uses the spatial and temporal contextual information to select representative query samples in a reinforcement learning framework.
We demonstrate the effectiveness of the proposed method by predicting streamflow and water temperature in the Delaware River Basin given a limited budget for collecting labeled data.
arXiv Detail & Related papers (2020-10-27T02:19:40Z) - Deep Semi-supervised Knowledge Distillation for Overlapping Cervical
Cell Instance Segmentation [54.49894381464853]
We propose to leverage both labeled and unlabeled data for instance segmentation with improved accuracy by knowledge distillation.
We propose a novel Mask-guided Mean Teacher framework with Perturbation-sensitive Sample Mining.
Experiments show that the proposed method improves the performance significantly compared with the supervised method learned from labeled data only.
arXiv Detail & Related papers (2020-07-21T13:27:09Z) - Progressive Identification of True Labels for Partial-Label Learning [112.94467491335611]
Partial-label learning (PLL) is a typical weakly supervised learning problem, where each training instance is equipped with a set of candidate labels among which only one is the true label.
Most existing methods elaborately designed as constrained optimizations that must be solved in specific manners, making their computational complexity a bottleneck for scaling up to big data.
This paper proposes a novel framework of classifier with flexibility on the model and optimization algorithm.
arXiv Detail & Related papers (2020-02-19T08:35:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.