Semi-Supervised Learning for Sparsely-Labeled Sequential Data:
Application to Healthcare Video Processing
- URL: http://arxiv.org/abs/2011.14101v4
- Date: Fri, 17 Sep 2021 02:51:03 GMT
- Title: Semi-Supervised Learning for Sparsely-Labeled Sequential Data:
Application to Healthcare Video Processing
- Authors: Florian Dubost, Erin Hong, Nandita Bhaskhar, Siyi Tang, Daniel Rubin,
Christopher Lee-Messer
- Abstract summary: We propose a semi-supervised machine learning training strategy to improve event detection performance on sequential data.
Our method uses noisy guesses of the events' end times to train event detection models.
We show that our strategy outperforms conservative estimates by 12 points of mean average precision for MNIST, and 3.5 points for CIFAR.
- Score: 0.8312466807725921
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Labeled data is a critical resource for training and evaluating machine
learning models. However, many real-life datasets are only partially labeled.
We propose a semi-supervised machine learning training strategy to improve
event detection performance on sequential data, such as video recordings, when
only sparse labels are available, such as event start times without their
corresponding end times. Our method uses noisy guesses of the events' end times
to train event detection models. Depending on how conservative these guesses
are, mislabeled false positives may be introduced into the training set (i.e.,
negative sequences mislabeled as positives). We further propose a mathematical
model for estimating how many inaccurate labels a model is exposed to, based on
how noisy the end time guesses are. Finally, we show that neural networks can
improve their detection performance by leveraging more training data with less
conservative approximations despite the higher proportion of incorrect labels.
We adapt sequential versions of MNIST and CIFAR-10 to empirically evaluate our
method, and find that our risk-tolerant strategy outperforms conservative
estimates by 12 points of mean average precision for MNIST, and 3.5 points for
CIFAR. Then, we leverage the proposed training strategy to tackle a real-life
application: processing continuous video recordings of epilepsy patients to
improve seizure detection, and show that our method outperforms baseline
labeling methods by 10 points of average precision.
Related papers
- Early Stopping Against Label Noise Without Validation Data [54.27621957395026]
We propose a novel early stopping method called Label Wave, which does not require validation data for selecting the desired model.
We show both the effectiveness of the Label Wave method across various settings and its capability to enhance the performance of existing methods for learning with noisy labels.
arXiv Detail & Related papers (2025-02-11T13:40:15Z) - Learning from Noisy Labels via Self-Taught On-the-Fly Meta Loss Rescaling [6.861041888341339]
We propose unsupervised on-the-fly meta loss rescaling to reweight training samples.
We are among the first to attempt on-the-fly training data reweighting on the challenging task of dialogue modeling.
Our strategy is robust in the face of noisy and clean data, handles class imbalance, and prevents overfitting to noisy labels.
arXiv Detail & Related papers (2024-12-17T14:37:50Z) - Boosting Semi-Supervised Learning by bridging high and low-confidence
predictions [4.18804572788063]
Pseudo-labeling is a crucial technique in semi-supervised learning (SSL)
We propose a new method called ReFixMatch, which aims to utilize all of the unlabeled data during training.
arXiv Detail & Related papers (2023-08-15T00:27:18Z) - Dash: Semi-Supervised Learning with Dynamic Thresholding [72.74339790209531]
We propose a semi-supervised learning (SSL) approach that uses unlabeled examples to train models.
Our proposed approach, Dash, enjoys its adaptivity in terms of unlabeled data selection.
arXiv Detail & Related papers (2021-09-01T23:52:29Z) - Imputation-Free Learning from Incomplete Observations [73.15386629370111]
We introduce the importance of guided gradient descent (IGSGD) method to train inference from inputs containing missing values without imputation.
We employ reinforcement learning (RL) to adjust the gradients used to train the models via back-propagation.
Our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.
arXiv Detail & Related papers (2021-07-05T12:44:39Z) - Uncertainty-aware Self-training for Text Classification with Few Labels [54.13279574908808]
We study self-training as one of the earliest semi-supervised learning approaches to reduce the annotation bottleneck.
We propose an approach to improve self-training by incorporating uncertainty estimates of the underlying neural network.
We show our methods leveraging only 20-30 labeled samples per class for each task for training and for validation can perform within 3% of fully supervised pre-trained language models.
arXiv Detail & Related papers (2020-06-27T08:13:58Z) - Don't Wait, Just Weight: Improving Unsupervised Representations by
Learning Goal-Driven Instance Weights [92.16372657233394]
Self-supervised learning techniques can boost performance by learning useful representations from unlabelled data.
We show that by learning Bayesian instance weights for the unlabelled data, we can improve the downstream classification accuracy.
Our method, BetaDataWeighter is evaluated using the popular self-supervised rotation prediction task on STL-10 and Visual Decathlon.
arXiv Detail & Related papers (2020-06-22T15:59:32Z) - Learning with Out-of-Distribution Data for Audio Classification [60.48251022280506]
We show that detecting and relabelling certain OOD instances, rather than discarding them, can have a positive effect on learning.
The proposed method is shown to improve the performance of convolutional neural networks by a significant margin.
arXiv Detail & Related papers (2020-02-11T21:08:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.