Related papers: Prediction in the presence of response-dependent missing labels

Prediction in the presence of response-dependent missing labels

URL: http://arxiv.org/abs/2103.13555v1
Date: Thu, 25 Mar 2021 01:43:33 GMT
Title: Prediction in the presence of response-dependent missing labels
Authors: Hyebin Song, Garvesh Raskutti, Rebecca Willett
Abstract summary: limitations of sensing technologies result in missing labels in wildfire data. We develop a new methodology and non-labeled algorithm P(ositive) U(ccurrence) M(agnitude) M(ixture) which jointly estimates the occurrence and detection likelihood of positive samples.
Score: 28.932172873182115
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: In a variety of settings, limitations of sensing technologies or other sampling mechanisms result in missing labels, where the likelihood of a missing label in the training set is an unknown function of the data. For example, satellites used to detect forest fires cannot sense fires below a certain size threshold. In such cases, training datasets consist of positive and pseudo-negative observations where pseudo-negative observations can be either true negatives or undetected positives with small magnitudes. We develop a new methodology and non-convex algorithm P(ositive) U(nlabeled) - O(ccurrence) M(agnitude) M(ixture) which jointly estimates the occurrence and detection likelihood of positive samples, utilizing prior knowledge of the detection mechanism. Our approach uses ideas from positive-unlabeled (PU)-learning and zero-inflated models that jointly estimate the magnitude and occurrence of events. We provide conditions under which our model is identifiable and prove that even though our approach leads to a non-convex objective, any local minimizer has optimal statistical error (up to a log term) and projected gradient descent has geometric convergence rates. We demonstrate on both synthetic data and a California wildfire dataset that our method out-performs existing state-of-the-art approaches.

Related papers

FUN-AD: Fully Unsupervised Learning for Anomaly Detection with Noisy Training Data [1.0650780147044159]
We propose a novel learning-based approach for fully unsupervised anomaly detection with unlabeled and potentially contaminated training data. Our method is motivated by two observations, that i) the pairwise feature distances between the normal samples are on average likely to be smaller than those between the anomaly samples or heterogeneous samples and ii) pairs of features mutually closest to each other are likely to be homogeneous pairs. Building on the first observation that nearest-neighbor distances can distinguish between confident normal samples and anomalies, we propose a pseudo-labeling strategy using an iteratively reconstructed memory bank.
arXiv Detail & Related papers (2024-11-25T05:51:38Z)
Unlearnable Examples Detection via Iterative Filtering [84.59070204221366]
Deep neural networks are proven to be vulnerable to data poisoning attacks. It is quite beneficial and challenging to detect poisoned samples from a mixed dataset. We propose an Iterative Filtering approach for UEs identification.
arXiv Detail & Related papers (2024-08-15T13:26:13Z)
Uncertainty Measurement of Deep Learning System based on the Convex Hull of Training Sets [0.13265175299265505]
We propose To-hull Uncertainty and Closure Ratio, which measures an uncertainty of trained model based on the convex hull of training data. It can observe the positional relation between the convex hull of the learned data and an unseen sample and infer how extrapolate the sample is from the convex hull.
arXiv Detail & Related papers (2024-05-25T06:25:24Z)
Joint empirical risk minimization for instance-dependent positive-unlabeled data [4.112909937203119]
Learning from positive and unlabeled data (PU learning) is actively researched machine learning task. The goal is to train a binary classification model based on a dataset containing part on positives which are labeled, and unlabeled instances. Unlabeled set includes remaining part positives and all negative observations.
arXiv Detail & Related papers (2023-12-27T12:45:12Z)
Asymptotically efficient adaptive identification under saturated output observation [1.9000124079328826]
We introduce a new adaptive Newton-type algorithm on the negative log-likelihood of partially observed samples. We show that the mean square error of the estimates can achieve the Cramer-Rao boundally without resorting to i.i.d data assumptions.
arXiv Detail & Related papers (2023-09-18T03:28:48Z)
Conservative Prediction via Data-Driven Confidence Minimization [70.93946578046003]
In safety-critical applications of machine learning, it is often desirable for a model to be conservative. We propose the Data-Driven Confidence Minimization framework, which minimizes confidence on an uncertainty dataset.
arXiv Detail & Related papers (2023-06-08T07:05:36Z)
Fake It Till You Make It: Near-Distribution Novelty Detection by Score-Based Generative Models [54.182955830194445]
existing models either fail or face a dramatic drop under the so-called near-distribution" setting. We propose to exploit a score-based generative model to produce synthetic near-distribution anomalous data. Our method improves the near-distribution novelty detection by 6% and passes the state-of-the-art by 1% to 5% across nine novelty detection benchmarks.
arXiv Detail & Related papers (2022-05-28T02:02:53Z)
Incorporating Semi-Supervised and Positive-Unlabeled Learning for Boosting Full Reference Image Quality Assessment [73.61888777504377]
Full-reference (FR) image quality assessment (IQA) evaluates the visual quality of a distorted image by measuring its perceptual difference with pristine-quality reference. Unlabeled data can be easily collected from an image degradation or restoration process, making it encouraging to exploit unlabeled training data to boost FR-IQA performance. In this paper, we suggest to incorporate semi-supervised and positive-unlabeled (PU) learning for exploiting unlabeled data while mitigating the adverse effect of outliers.
arXiv Detail & Related papers (2022-04-19T09:10:06Z)
Semi-supervised Salient Object Detection with Effective Confidence Estimation [35.0990691497574]
We study semi-supervised salient object detection with access to a small number of labeled samples and a large number of unlabeled samples. We model the nature of human saliency labels using the latent variable of the Conditional Energy-based Model. With only 1/16 labeled samples, our model achieves competitive performance compared with state-of-the-art fully-supervised models.
arXiv Detail & Related papers (2021-12-28T07:14:48Z)
Dealing with Distribution Mismatch in Semi-supervised Deep Learning for Covid-19 Detection Using Chest X-ray Images: A Novel Approach Using Feature Densities [0.6882042556551609]
Semi-supervised deep learning is an attractive alternative to large labelled datasets. In real-world usage settings, an unlabelled dataset might present a different distribution than the labelled dataset. This results in a distribution mismatch between the unlabelled and labelled datasets.
arXiv Detail & Related papers (2021-08-17T00:35:43Z)
Minimax Active Learning [61.729667575374606]
Active learning aims to develop label-efficient algorithms by querying the most representative samples to be labeled by a human annotator. Current active learning techniques either rely on model uncertainty to select the most uncertain samples or use clustering or reconstruction to choose the most diverse set of unlabeled examples. We develop a semi-supervised minimax entropy-based active learning algorithm that leverages both uncertainty and diversity in an adversarial manner.
arXiv Detail & Related papers (2020-12-18T19:03:40Z)
Learning with Out-of-Distribution Data for Audio Classification [60.48251022280506]
We show that detecting and relabelling certain OOD instances, rather than discarding them, can have a positive effect on learning. The proposed method is shown to improve the performance of convolutional neural networks by a significant margin.
arXiv Detail & Related papers (2020-02-11T21:08:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.