Neural Network Training with Highly Incomplete Datasets
- URL: http://arxiv.org/abs/2107.00429v1
- Date: Thu, 1 Jul 2021 13:21:45 GMT
- Title: Neural Network Training with Highly Incomplete Datasets
- Authors: Yu-Wei Chang and Laura Natali and Oveis Jamialahmadi and Stefano Romeo
and Joana B. Pereira and Giovanni Volpe
- Abstract summary: GapNet is an alternative deep-learning training approach that can use highly incomplete datasets.
We show that GapNet improves the identification of patients with underlying Alzheimer's disease pathology and of patients at risk of hospitalization due to Covid-19.
- Score: 1.5658704610960568
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural network training and validation rely on the availability of large
high-quality datasets. However, in many cases only incomplete datasets are
available, particularly in health care applications, where each patient
typically undergoes different clinical procedures or can drop out of a study.
Since the data to train the neural networks need to be complete, most studies
discard the incomplete datapoints, which reduces the size of the training data,
or impute the missing features, which can lead to artefacts. Alas, both
approaches are inadequate when a large portion of the data is missing. Here, we
introduce GapNet, an alternative deep-learning training approach that can use
highly incomplete datasets. First, the dataset is split into subsets of samples
containing all values for a certain cluster of features. Then, these subsets
are used to train individual neural networks. Finally, this ensemble of neural
networks is combined into a single neural network whose training is fine-tuned
using all complete datapoints. Using two highly incomplete real-world medical
datasets, we show that GapNet improves the identification of patients with
underlying Alzheimer's disease pathology and of patients at risk of
hospitalization due to Covid-19. By distilling the information available in
incomplete datasets without having to reduce their size or to impute missing
values, GapNet will permit to extract valuable information from a wide range of
datasets, benefiting diverse fields from medicine to engineering.
Related papers
- Dataset Quantization [72.61936019738076]
We present dataset quantization (DQ), a new framework to compress large-scale datasets into small subsets.
DQ is the first method that can successfully distill large-scale datasets such as ImageNet-1k with a state-of-the-art compression ratio.
arXiv Detail & Related papers (2023-08-21T07:24:29Z) - Neural Network Architecture for Database Augmentation Using Shared
Features [0.0]
Inherent challenges in some domains such as medicine make it difficult to create large single source datasets or multi-source datasets with identical features.
We propose a neural network architecture that can provide data augmentation using features common between these datasets.
arXiv Detail & Related papers (2023-02-02T19:17:06Z) - Exposing and addressing the fragility of neural networks in digital
pathology [0.0]
textttStrongAugment is evaluated with large-scale, heterogeneous histopathology data.
neural networks trained with textttStrongAugment retain similar performance on all datasets.
arXiv Detail & Related papers (2022-06-30T13:25:34Z) - What Can be Seen is What You Get: Structure Aware Point Cloud
Augmentation [0.966840768820136]
We present novel point cloud augmentation methods to artificially diversify a dataset.
Our sensor-centric methods keep the data structure consistent with the lidar sensor capabilities.
We show that our methods enable the use of very small datasets, saving annotation time, training time and the associated costs.
arXiv Detail & Related papers (2022-06-20T09:10:59Z) - CvS: Classification via Segmentation For Small Datasets [52.821178654631254]
This paper presents CvS, a cost-effective classifier for small datasets that derives the classification labels from predicting the segmentation maps.
We evaluate the effectiveness of our framework on diverse problems showing that CvS is able to achieve much higher classification results compared to previous methods when given only a handful of examples.
arXiv Detail & Related papers (2021-10-29T18:41:15Z) - A Light-weight Interpretable CompositionalNetwork for Nuclei Detection
and Weakly-supervised Segmentation [10.196621315018884]
Deep neural networks usually require large numbers of annotated data to train vast parameters.
We propose to build a data-efficient model, which only requires partial annotation, specifically on isolated nucleus.
arXiv Detail & Related papers (2021-10-26T16:44:08Z) - Dive into Layers: Neural Network Capacity Bounding using Algebraic
Geometry [55.57953219617467]
We show that the learnability of a neural network is directly related to its size.
We use Betti numbers to measure the topological geometric complexity of input data and the neural network.
We perform the experiments on a real-world dataset MNIST and the results verify our analysis and conclusion.
arXiv Detail & Related papers (2021-09-03T11:45:51Z) - Domain Generalization for Medical Imaging Classification with
Linear-Dependency Regularization [59.5104563755095]
We introduce a simple but effective approach to improve the generalization capability of deep neural networks in the field of medical imaging classification.
Motivated by the observation that the domain variability of the medical images is to some extent compact, we propose to learn a representative feature space through variational encoding.
arXiv Detail & Related papers (2020-09-27T12:30:30Z) - Exploiting Multi-Modal Features From Pre-trained Networks for
Alzheimer's Dementia Recognition [16.006407253670396]
We exploit various multi-modal features extracted from pre-trained networks to recognize Alzheimer's Dementia using a neural network.
We modify a Convolutional Recurrent Neural Network based structure to perform classification and regression tasks simultaneously.
Our test results surpass baseline's accuracy by 18.75%, and our validation result for the regression task shows the possibility of classifying 4 classes of cognitive impairment with an accuracy of 78.70%.
arXiv Detail & Related papers (2020-09-09T02:08:47Z) - MS-Net: Multi-Site Network for Improving Prostate Segmentation with
Heterogeneous MRI Data [75.73881040581767]
We propose a novel multi-site network (MS-Net) for improving prostate segmentation by learning robust representations.
Our MS-Net improves the performance across all datasets consistently, and outperforms state-of-the-art methods for multi-site learning.
arXiv Detail & Related papers (2020-02-09T14:11:50Z) - DeGAN : Data-Enriching GAN for Retrieving Representative Samples from a
Trained Classifier [58.979104709647295]
We bridge the gap between the abundance of available data and lack of relevant data, for the future learning tasks of a trained network.
We use the available data, that may be an imbalanced subset of the original training dataset, or a related domain dataset, to retrieve representative samples.
We demonstrate that data from a related domain can be leveraged to achieve state-of-the-art performance.
arXiv Detail & Related papers (2019-12-27T02:05:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.