Sample Efficient Learning of Image-Based Diagnostic Classifiers Using
Probabilistic Labels
- URL: http://arxiv.org/abs/2102.06164v1
- Date: Thu, 11 Feb 2021 18:13:56 GMT
- Title: Sample Efficient Learning of Image-Based Diagnostic Classifiers Using
Probabilistic Labels
- Authors: Roberto Vega, Pouneh Gorji, Zichen Zhang, Xuebin Qin, Abhilash
Rakkunedeth Hareendranathan, Jeevesh Kapur, Jacob L. Jaremko, Russell Greiner
- Abstract summary: We propose a way to learn and use probabilistic labels to train accurate and calibrated deep networks from relatively small datasets.
We observe gains of up to 22% in the accuracy of models trained with these labels, as compared with traditional approaches.
- Score: 11.377362220429786
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning approaches often require huge datasets to achieve good
generalization. This complicates its use in tasks like image-based medical
diagnosis, where the small training datasets are usually insufficient to learn
appropriate data representations. For such sensitive tasks it is also important
to provide the confidence in the predictions. Here, we propose a way to learn
and use probabilistic labels to train accurate and calibrated deep networks
from relatively small datasets. We observe gains of up to 22% in the accuracy
of models trained with these labels, as compared with traditional approaches,
in three classification tasks: diagnosis of hip dysplasia, fatty liver, and
glaucoma. The outputs of models trained with probabilistic labels are
calibrated, allowing the interpretation of its predictions as proper
probabilities. We anticipate this approach will apply to other tasks where few
training instances are available and expert knowledge can be encoded as
probabilities.
Related papers
- Refining Tuberculosis Detection in CXR Imaging: Addressing Bias in Deep Neural Networks via Interpretability [1.9936075659851882]
We argue that the reliability of deep learning models is limited, even if they can be shown to obtain perfect classification accuracy on the test data.
We show that pre-training a deep neural network on a large-scale proxy task, as well as using mixed objective optimization network (MOON), can improve the alignment of decision foundations between models and experts.
arXiv Detail & Related papers (2024-07-19T06:41:31Z) - A Saliency-based Clustering Framework for Identifying Aberrant
Predictions [49.1574468325115]
We introduce the concept of aberrant predictions, emphasizing that the nature of classification errors is as critical as their frequency.
We propose a novel, efficient training methodology aimed at both reducing the misclassification rate and discerning aberrant predictions.
We apply this methodology to the less-explored domain of veterinary radiology, where the stakes are high but have not been as extensively studied compared to human medicine.
arXiv Detail & Related papers (2023-11-11T01:53:59Z) - Contrastive Deep Encoding Enables Uncertainty-aware
Machine-learning-assisted Histopathology [6.548275341067594]
terabytes of training data can be consciously utilized to pre-train deep networks to encode informative representations.
We show that our approach can reach the state-of-the-art (SOTA) for patch-level classification with only 1-10% randomly selected annotations.
arXiv Detail & Related papers (2023-09-13T17:37:19Z) - Self-Supervised Learning as a Means To Reduce the Need for Labeled Data
in Medical Image Analysis [64.4093648042484]
We use a dataset of chest X-ray images with bounding box labels for 13 different classes of anomalies.
We show that it is possible to achieve similar performance to a fully supervised model in terms of mean average precision and accuracy with only 60% of the labeled data.
arXiv Detail & Related papers (2022-06-01T09:20:30Z) - Semi-supervised Deep Learning for Image Classification with Distribution
Mismatch: A Survey [1.5469452301122175]
Deep learning models rely on the abundance of labelled observations to train a prospective model.
It is expensive to gather labelled observations of data, making the usage of deep learning models not ideal.
In many situations different unlabelled data sources might be available.
This raises the risk of a significant distribution mismatch between the labelled and unlabelled datasets.
arXiv Detail & Related papers (2022-03-01T02:46:00Z) - Masked prediction tasks: a parameter identifiability view [49.533046139235466]
We focus on the widely used self-supervised learning method of predicting masked tokens.
We show that there is a rich landscape of possibilities, out of which some prediction tasks yield identifiability, while others do not.
arXiv Detail & Related papers (2022-02-18T17:09:32Z) - Debiased Pseudo Labeling in Self-Training [77.83549261035277]
Deep neural networks achieve remarkable performances on a wide range of tasks with the aid of large-scale labeled datasets.
To mitigate the requirement for labeled data, self-training is widely used in both academia and industry by pseudo labeling on readily-available unlabeled data.
We propose Debiased, in which the generation and utilization of pseudo labels are decoupled by two independent heads.
arXiv Detail & Related papers (2022-02-15T02:14:33Z) - A Real Use Case of Semi-Supervised Learning for Mammogram Classification
in a Local Clinic of Costa Rica [0.5541644538483946]
Training a deep learning model requires a considerable amount of labeled images.
A number of publicly available datasets have been built with data from different hospitals and clinics.
The use of the semi-supervised deep learning approach known as MixMatch, to leverage the usage of unlabeled data is proposed and evaluated.
arXiv Detail & Related papers (2021-07-24T22:26:50Z) - Scalable Marginal Likelihood Estimation for Model Selection in Deep
Learning [78.83598532168256]
Marginal-likelihood based model-selection is rarely used in deep learning due to estimation difficulties.
Our work shows that marginal likelihoods can improve generalization and be useful when validation data is unavailable.
arXiv Detail & Related papers (2021-04-11T09:50:24Z) - Self-Training with Improved Regularization for Sample-Efficient Chest
X-Ray Classification [80.00316465793702]
We present a deep learning framework that enables robust modeling in challenging scenarios.
Our results show that using 85% lesser labeled data, we can build predictive models that match the performance of classifiers trained in a large-scale data setting.
arXiv Detail & Related papers (2020-05-03T02:36:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.