Related papers: Neural Network Training with Highly Incomplete Datasets

Neural Network Training with Highly Incomplete Datasets

URL: http://arxiv.org/abs/2107.00429v1
Date: Thu, 1 Jul 2021 13:21:45 GMT
Title: Neural Network Training with Highly Incomplete Datasets
Authors: Yu-Wei Chang and Laura Natali and Oveis Jamialahmadi and Stefano Romeo and Joana B. Pereira and Giovanni Volpe
Abstract summary: GapNet is an alternative deep-learning training approach that can use highly incomplete datasets. We show that GapNet improves the identification of patients with underlying Alzheimer's disease pathology and of patients at risk of hospitalization due to Covid-19.
Score: 1.5658704610960568
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Neural network training and validation rely on the availability of large high-quality datasets. However, in many cases only incomplete datasets are available, particularly in health care applications, where each patient typically undergoes different clinical procedures or can drop out of a study. Since the data to train the neural networks need to be complete, most studies discard the incomplete datapoints, which reduces the size of the training data, or impute the missing features, which can lead to artefacts. Alas, both approaches are inadequate when a large portion of the data is missing. Here, we introduce GapNet, an alternative deep-learning training approach that can use highly incomplete datasets. First, the dataset is split into subsets of samples containing all values for a certain cluster of features. Then, these subsets are used to train individual neural networks. Finally, this ensemble of neural networks is combined into a single neural network whose training is fine-tuned using all complete datapoints. Using two highly incomplete real-world medical datasets, we show that GapNet improves the identification of patients with underlying Alzheimer's disease pathology and of patients at risk of hospitalization due to Covid-19. By distilling the information available in incomplete datasets without having to reduce their size or to impute missing values, GapNet will permit to extract valuable information from a wide range of datasets, benefiting diverse fields from medicine to engineering.

Related papers

Parsimonious Dataset Construction for Laparoscopic Cholecystectomy Structure Segmentation [8.223940676615857]
We construct a high-quality, affordable Laparoscopic Cholecystectomy dataset for semantic segmentation. Active learning allows the Deep Neural Networks (DNNs) learning pipeline to include the dataset construction workflow.
arXiv Detail & Related papers (2025-04-17T01:40:30Z)
Dataset Quantization [72.61936019738076]
We present dataset quantization (DQ), a new framework to compress large-scale datasets into small subsets. DQ is the first method that can successfully distill large-scale datasets such as ImageNet-1k with a state-of-the-art compression ratio.
arXiv Detail & Related papers (2023-08-21T07:24:29Z)
Neural Network Architecture for Database Augmentation Using Shared Features [0.0]
Inherent challenges in some domains such as medicine make it difficult to create large single source datasets or multi-source datasets with identical features. We propose a neural network architecture that can provide data augmentation using features common between these datasets.
arXiv Detail & Related papers (2023-02-02T19:17:06Z)
Exposing and addressing the fragility of neural networks in digital pathology [0.0]
textttStrongAugment is evaluated with large-scale, heterogeneous histopathology data. neural networks trained with textttStrongAugment retain similar performance on all datasets.
arXiv Detail & Related papers (2022-06-30T13:25:34Z)
What Can be Seen is What You Get: Structure Aware Point Cloud Augmentation [0.966840768820136]
We present novel point cloud augmentation methods to artificially diversify a dataset. Our sensor-centric methods keep the data structure consistent with the lidar sensor capabilities. We show that our methods enable the use of very small datasets, saving annotation time, training time and the associated costs.
arXiv Detail & Related papers (2022-06-20T09:10:59Z)
CvS: Classification via Segmentation For Small Datasets [52.821178654631254]
This paper presents CvS, a cost-effective classifier for small datasets that derives the classification labels from predicting the segmentation maps. We evaluate the effectiveness of our framework on diverse problems showing that CvS is able to achieve much higher classification results compared to previous methods when given only a handful of examples.
arXiv Detail & Related papers (2021-10-29T18:41:15Z)
A Light-weight Interpretable CompositionalNetwork for Nuclei Detection and Weakly-supervised Segmentation [10.196621315018884]
Deep neural networks usually require large numbers of annotated data to train vast parameters. We propose to build a data-efficient model, which only requires partial annotation, specifically on isolated nucleus.
arXiv Detail & Related papers (2021-10-26T16:44:08Z)
Dive into Layers: Neural Network Capacity Bounding using Algebraic Geometry [55.57953219617467]
We show that the learnability of a neural network is directly related to its size. We use Betti numbers to measure the topological geometric complexity of input data and the neural network. We perform the experiments on a real-world dataset MNIST and the results verify our analysis and conclusion.
arXiv Detail & Related papers (2021-09-03T11:45:51Z)
Domain Generalization for Medical Imaging Classification with Linear-Dependency Regularization [59.5104563755095]
We introduce a simple but effective approach to improve the generalization capability of deep neural networks in the field of medical imaging classification. Motivated by the observation that the domain variability of the medical images is to some extent compact, we propose to learn a representative feature space through variational encoding.
arXiv Detail & Related papers (2020-09-27T12:30:30Z)
Exploiting Multi-Modal Features From Pre-trained Networks for Alzheimer's Dementia Recognition [16.006407253670396]
We exploit various multi-modal features extracted from pre-trained networks to recognize Alzheimer's Dementia using a neural network. We modify a Convolutional Recurrent Neural Network based structure to perform classification and regression tasks simultaneously. Our test results surpass baseline's accuracy by 18.75%, and our validation result for the regression task shows the possibility of classifying 4 classes of cognitive impairment with an accuracy of 78.70%.
arXiv Detail & Related papers (2020-09-09T02:08:47Z)
MS-Net: Multi-Site Network for Improving Prostate Segmentation with Heterogeneous MRI Data [75.73881040581767]
We propose a novel multi-site network (MS-Net) for improving prostate segmentation by learning robust representations. Our MS-Net improves the performance across all datasets consistently, and outperforms state-of-the-art methods for multi-site learning.
arXiv Detail & Related papers (2020-02-09T14:11:50Z)
DeGAN : Data-Enriching GAN for Retrieving Representative Samples from a Trained Classifier [58.979104709647295]
We bridge the gap between the abundance of available data and lack of relevant data, for the future learning tasks of a trained network. We use the available data, that may be an imbalanced subset of the original training dataset, or a related domain dataset, to retrieve representative samples. We demonstrate that data from a related domain can be leveraged to achieve state-of-the-art performance.
arXiv Detail & Related papers (2019-12-27T02:05:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.