Prediction in the presence of response-dependent missing labels
- URL: http://arxiv.org/abs/2103.13555v1
- Date: Thu, 25 Mar 2021 01:43:33 GMT
- Title: Prediction in the presence of response-dependent missing labels
- Authors: Hyebin Song, Garvesh Raskutti, Rebecca Willett
- Abstract summary: limitations of sensing technologies result in missing labels in wildfire data.
We develop a new methodology and non-labeled algorithm P(ositive) U(ccurrence) M(agnitude) M(ixture) which jointly estimates the occurrence and detection likelihood of positive samples.
- Score: 28.932172873182115
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In a variety of settings, limitations of sensing technologies or other
sampling mechanisms result in missing labels, where the likelihood of a missing
label in the training set is an unknown function of the data. For example,
satellites used to detect forest fires cannot sense fires below a certain size
threshold. In such cases, training datasets consist of positive and
pseudo-negative observations where pseudo-negative observations can be either
true negatives or undetected positives with small magnitudes. We develop a new
methodology and non-convex algorithm P(ositive) U(nlabeled) - O(ccurrence)
M(agnitude) M(ixture) which jointly estimates the occurrence and detection
likelihood of positive samples, utilizing prior knowledge of the detection
mechanism. Our approach uses ideas from positive-unlabeled (PU)-learning and
zero-inflated models that jointly estimate the magnitude and occurrence of
events. We provide conditions under which our model is identifiable and prove
that even though our approach leads to a non-convex objective, any local
minimizer has optimal statistical error (up to a log term) and projected
gradient descent has geometric convergence rates. We demonstrate on both
synthetic data and a California wildfire dataset that our method out-performs
existing state-of-the-art approaches.
Related papers
- FUN-AD: Fully Unsupervised Learning for Anomaly Detection with Noisy Training Data [1.0650780147044159]
We propose a novel learning-based approach for fully unsupervised anomaly detection with unlabeled and potentially contaminated training data.
Our method is motivated by two observations, that i) the pairwise feature distances between the normal samples are on average likely to be smaller than those between the anomaly samples or heterogeneous samples and ii) pairs of features mutually closest to each other are likely to be homogeneous pairs.
Building on the first observation that nearest-neighbor distances can distinguish between confident normal samples and anomalies, we propose a pseudo-labeling strategy using an iteratively reconstructed memory bank.
arXiv Detail & Related papers (2024-11-25T05:51:38Z) - Unlearnable Examples Detection via Iterative Filtering [84.59070204221366]
Deep neural networks are proven to be vulnerable to data poisoning attacks.
It is quite beneficial and challenging to detect poisoned samples from a mixed dataset.
We propose an Iterative Filtering approach for UEs identification.
arXiv Detail & Related papers (2024-08-15T13:26:13Z) - Uncertainty Measurement of Deep Learning System based on the Convex Hull of Training Sets [0.13265175299265505]
We propose To-hull Uncertainty and Closure Ratio, which measures an uncertainty of trained model based on the convex hull of training data.
It can observe the positional relation between the convex hull of the learned data and an unseen sample and infer how extrapolate the sample is from the convex hull.
arXiv Detail & Related papers (2024-05-25T06:25:24Z) - Joint empirical risk minimization for instance-dependent
positive-unlabeled data [4.112909937203119]
Learning from positive and unlabeled data (PU learning) is actively researched machine learning task.
The goal is to train a binary classification model based on a dataset containing part on positives which are labeled, and unlabeled instances.
Unlabeled set includes remaining part positives and all negative observations.
arXiv Detail & Related papers (2023-12-27T12:45:12Z) - Conservative Prediction via Data-Driven Confidence Minimization [70.93946578046003]
In safety-critical applications of machine learning, it is often desirable for a model to be conservative.
We propose the Data-Driven Confidence Minimization framework, which minimizes confidence on an uncertainty dataset.
arXiv Detail & Related papers (2023-06-08T07:05:36Z) - Fake It Till You Make It: Near-Distribution Novelty Detection by
Score-Based Generative Models [54.182955830194445]
existing models either fail or face a dramatic drop under the so-called near-distribution" setting.
We propose to exploit a score-based generative model to produce synthetic near-distribution anomalous data.
Our method improves the near-distribution novelty detection by 6% and passes the state-of-the-art by 1% to 5% across nine novelty detection benchmarks.
arXiv Detail & Related papers (2022-05-28T02:02:53Z) - Incorporating Semi-Supervised and Positive-Unlabeled Learning for
Boosting Full Reference Image Quality Assessment [73.61888777504377]
Full-reference (FR) image quality assessment (IQA) evaluates the visual quality of a distorted image by measuring its perceptual difference with pristine-quality reference.
Unlabeled data can be easily collected from an image degradation or restoration process, making it encouraging to exploit unlabeled training data to boost FR-IQA performance.
In this paper, we suggest to incorporate semi-supervised and positive-unlabeled (PU) learning for exploiting unlabeled data while mitigating the adverse effect of outliers.
arXiv Detail & Related papers (2022-04-19T09:10:06Z) - Semi-supervised Salient Object Detection with Effective Confidence
Estimation [35.0990691497574]
We study semi-supervised salient object detection with access to a small number of labeled samples and a large number of unlabeled samples.
We model the nature of human saliency labels using the latent variable of the Conditional Energy-based Model.
With only 1/16 labeled samples, our model achieves competitive performance compared with state-of-the-art fully-supervised models.
arXiv Detail & Related papers (2021-12-28T07:14:48Z) - Dealing with Distribution Mismatch in Semi-supervised Deep Learning for
Covid-19 Detection Using Chest X-ray Images: A Novel Approach Using Feature
Densities [0.6882042556551609]
Semi-supervised deep learning is an attractive alternative to large labelled datasets.
In real-world usage settings, an unlabelled dataset might present a different distribution than the labelled dataset.
This results in a distribution mismatch between the unlabelled and labelled datasets.
arXiv Detail & Related papers (2021-08-17T00:35:43Z) - Minimax Active Learning [61.729667575374606]
Active learning aims to develop label-efficient algorithms by querying the most representative samples to be labeled by a human annotator.
Current active learning techniques either rely on model uncertainty to select the most uncertain samples or use clustering or reconstruction to choose the most diverse set of unlabeled examples.
We develop a semi-supervised minimax entropy-based active learning algorithm that leverages both uncertainty and diversity in an adversarial manner.
arXiv Detail & Related papers (2020-12-18T19:03:40Z) - Learning with Out-of-Distribution Data for Audio Classification [60.48251022280506]
We show that detecting and relabelling certain OOD instances, rather than discarding them, can have a positive effect on learning.
The proposed method is shown to improve the performance of convolutional neural networks by a significant margin.
arXiv Detail & Related papers (2020-02-11T21:08:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.