Mitigating sampling bias in risk-based active learning via an EM
algorithm
- URL: http://arxiv.org/abs/2206.12598v1
- Date: Sat, 25 Jun 2022 08:48:25 GMT
- Title: Mitigating sampling bias in risk-based active learning via an EM
algorithm
- Authors: Aidan J. Hughes, Lawrence A. Bull, Paul Gardner, Nikolaos Dervilis,
Keith Worden
- Abstract summary: Risk-based active learning is an approach to developing statistical classifiers for online decision-support.
Data-label querying is guided according to the expected value of perfect information for incipient data points.
Semi-supervised approach counteracts sampling bias by incorporating pseudo-labels for unlabelled data via an EM algorithm.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Risk-based active learning is an approach to developing statistical
classifiers for online decision-support. In this approach, data-label querying
is guided according to the expected value of perfect information for incipient
data points. For SHM applications, the value of information is evaluated with
respect to a maintenance decision process, and the data-label querying
corresponds to the inspection of a structure to determine its health state.
Sampling bias is a known issue within active-learning paradigms; this occurs
when an active learning process over- or undersamples specific regions of a
feature-space, thereby resulting in a training set that is not representative
of the underlying distribution. This bias ultimately degrades decision-making
performance, and as a consequence, results in unnecessary costs incurred. The
current paper outlines a risk-based approach to active learning that utilises a
semi-supervised Gaussian mixture model. The semi-supervised approach
counteracts sampling bias by incorporating pseudo-labels for unlabelled data
via an EM algorithm. The approach is demonstrated on a numerical example
representative of the decision processes found in SHM.
Related papers
- Downstream-Pretext Domain Knowledge Traceback for Active Learning [138.02530777915362]
We propose a downstream-pretext domain knowledge traceback (DOKT) method that traces the data interactions of downstream knowledge and pre-training guidance.
DOKT consists of a traceback diversity indicator and a domain-based uncertainty estimator.
Experiments conducted on ten datasets show that our model outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-07-20T01:34:13Z) - Querying Easily Flip-flopped Samples for Deep Active Learning [63.62397322172216]
Active learning is a machine learning paradigm that aims to improve the performance of a model by strategically selecting and querying unlabeled data.
One effective selection strategy is to base it on the model's predictive uncertainty, which can be interpreted as a measure of how informative a sample is.
This paper proposes the it least disagree metric (LDM) as the smallest probability of disagreement of the predicted label.
arXiv Detail & Related papers (2024-01-18T08:12:23Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - MAPS: A Noise-Robust Progressive Learning Approach for Source-Free
Domain Adaptive Keypoint Detection [76.97324120775475]
Cross-domain keypoint detection methods always require accessing the source data during adaptation.
This paper considers source-free domain adaptive keypoint detection, where only the well-trained source model is provided to the target domain.
arXiv Detail & Related papers (2023-02-09T12:06:08Z) - Stream-based active learning with linear models [0.7734726150561089]
In production, instead of performing random inspections to obtain product information, labels are collected by evaluating the information content of the unlabeled data.
We propose a new strategy for the stream-based scenario, where instances are sequentially offered to the learner.
The iterative aspect of the decision-making process is tackled by setting a threshold on the informativeness of the unlabeled data points.
arXiv Detail & Related papers (2022-07-20T13:15:23Z) - Improving decision-making via risk-based active learning: Probabilistic
discriminative classifiers [0.0]
descriptive labels for measured data corresponding to health-states of monitored systems are often unavailable.
One approach to dealing with this problem is risk-based active learning.
The current paper demonstrates several advantages of using an alternative type of classifier -- discriminative models.
arXiv Detail & Related papers (2022-06-23T10:51:42Z) - Self-Certifying Classification by Linearized Deep Assignment [65.0100925582087]
We propose a novel class of deep predictors for classifying metric data on graphs within PAC-Bayes risk certification paradigm.
Building on the recent PAC-Bayes literature and data-dependent priors, this approach enables learning posterior distributions on the hypothesis space.
arXiv Detail & Related papers (2022-01-26T19:59:14Z) - On robust risk-based active-learning algorithms for enhanced decision
support [0.0]
Classification models are a fundamental component of physical-asset management technologies such as structural health monitoring (SHM) systems and digital twins.
The paper proposes two novel approaches to counteract the effects of sampling bias: textitsemi-supervised learning, and textitdiscriminative classification models.
arXiv Detail & Related papers (2022-01-07T17:25:41Z) - Learning Bias-Invariant Representation by Cross-Sample Mutual
Information Minimization [77.8735802150511]
We propose a cross-sample adversarial debiasing (CSAD) method to remove the bias information misused by the target task.
The correlation measurement plays a critical role in adversarial debiasing and is conducted by a cross-sample neural mutual information estimator.
We conduct thorough experiments on publicly available datasets to validate the advantages of the proposed method over state-of-the-art approaches.
arXiv Detail & Related papers (2021-08-11T21:17:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.