Related papers: Data Augmentation in High Dimensional Low Sample Size Setting Using a Geometry-Based Variational Autoencoder

Data Augmentation in High Dimensional Low Sample Size Setting Using a Geometry-Based Variational Autoencoder

URL: http://arxiv.org/abs/2105.00026v1
Date: Fri, 30 Apr 2021 18:10:33 GMT
Title: Data Augmentation in High Dimensional Low Sample Size Setting Using a Geometry-Based Variational Autoencoder
Authors: Cl\'ement Chadebec, Elina Thibeau-Sutre, Ninon Burgos and St\'ephanie Allassonni\`ere
Abstract summary: We propose a new method to perform data augmentation in the High Dimensional Low Sample Size (HDLSS) setting using a geometry-based variational autoencoder. Our approach combines a proper latent space modeling of the VAE seen as a Riemannian manifold with a new generation scheme which produces more meaningful samples.
Score: 0.1529342790344802
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we propose a new method to perform data augmentation in a reliable way in the High Dimensional Low Sample Size (HDLSS) setting using a geometry-based variational autoencoder. Our approach combines a proper latent space modeling of the VAE seen as a Riemannian manifold with a new generation scheme which produces more meaningful samples especially in the context of small data sets. The proposed method is tested through a wide experimental study where its robustness to data sets, classifiers and training samples size is stressed. It is also validated on a medical imaging classification task on the challenging ADNI database where a small number of 3D brain MRIs are considered and augmented using the proposed VAE framework. In each case, the proposed method allows for a significant and reliable gain in the classification metrics. For instance, balanced accuracy jumps from 66.3% to 74.3% for a state-of-the-art CNN classifier trained with 50 MRIs of cognitively normal (CN) and 50 Alzheimer disease (AD) patients and from 77.7% to 86.3% when trained with 243 CN and 210 AD while improving greatly sensitivity and specificity metrics.

Related papers

Enhancing Omics Cohort Discovery for Research on Neurodegeneration through Ontology-Augmented Embedding Models [0.14999444543328289]
NeuroEmbed is an approach for the engineering of semantically accurate embedding spaces to represent cohorts and samples.<n>The NeuroEmbed method comprises four stages: (1) extraction of cohorts from public repositories; (2) semi-automated normalization and augmentation of metadata of cohorts and samples using biomedical clustering and clustering on the embedding space; (3) automated generation of a natural language question-answering dataset for cohorts and samples based on randomized combinations of standardized metadata dimensions; and (4) fine-tuning of a domain-specific embedder to optimize queries.
arXiv Detail & Related papers (2025-06-16T13:27:10Z)
Hand Gesture Classification on Praxis Dataset: Trading Accuracy for Expense [0.6390468088226495]
We focus on'skeletal' data represented by the body joint coordinates, from the Praxis dataset. The PRAXIS dataset contains recordings of patients with cortical pathologies such as Alzheimer's disease. Using a combination of windowing techniques with deep learning architecture such as a Recurrent Neural Network (RNN), we achieved an overall accuracy of 70.8%.
arXiv Detail & Related papers (2023-11-01T18:18:09Z)
The effect of data augmentation and 3D-CNN depth on Alzheimer's Disease detection [51.697248252191265]
This work summarizes and strictly observes best practices regarding data handling, experimental design, and model evaluation. We focus on Alzheimer's Disease (AD) detection, which serves as a paradigmatic example of challenging problem in healthcare. Within this framework, we train predictive 15 models, considering three different data augmentation strategies and five distinct 3D CNN architectures.
arXiv Detail & Related papers (2023-09-13T10:40:41Z)
Unsupervised Domain Adaptation for Automated Knee Osteoarthritis Phenotype Classification [0.8288807499147146]
The aim of this study was to demonstrate the utility of unsupervised domain adaptation (UDA) in automated knee osteoarthritis (OA) phenotype classification using a small dataset.
arXiv Detail & Related papers (2022-12-14T04:26:32Z)
Lightweight 3D Convolutional Neural Network for Schizophrenia diagnosis using MRI Images and Ensemble Bagging Classifier [1.487444917213389]
This paper proposed a lightweight 3D convolutional neural network (CNN) based framework for schizophrenia diagnosis using MRI images. The model achieves the highest accuracy 92.22%, sensitivity 94.44%, specificity 90%, precision 90.43%, recall 94.44%, F1-score 92.39% and G-mean 92.19% as compared to the current state-of-the-art techniques.
arXiv Detail & Related papers (2022-11-05T10:27:37Z)
Federated Learning Enables Big Data for Rare Cancer Boundary Detection [98.5549882883963]
We present findings from the largest Federated ML study to-date, involving data from 71 healthcare institutions across 6 continents. We generate an automatic tumor boundary detector for the rare disease of glioblastoma. We demonstrate a 33% improvement over a publicly trained model to delineate the surgically targetable tumor, and 23% improvement over the tumor's entire extent.
arXiv Detail & Related papers (2022-04-22T17:27:00Z)
SCALP -- Supervised Contrastive Learning for Cardiopulmonary Disease Classification and Localization in Chest X-rays using Patient Metadata [10.269187107011934]
We introduce an end-to-end framework, SCALP, which extends the self-supervised contrastive approach to a supervised setting. SCALP pulls together chest X-rays from the same patient (positive keys) and pushes apart chest X-rays from different patients (negative keys) Our experiments demonstrate that SCALP outperforms existing baselines with significant margins in both classification and localization tasks.
arXiv Detail & Related papers (2021-10-27T21:38:12Z)
Accuracy Improvement for Fully Convolutional Networks via Selective Augmentation with Applications to Electrocardiogram Data [0.0]
The accuracy of the proposed approach was optimal near a defined upper threshold for qualifying low confidence samples and decreased as this threshold was raised to include higher confidence samples. This suggests exclusively selecting lower confidence samples for data augmentation comes with distinct benefits for electrocardiogram data classification with Fully Convolutional Networks.
arXiv Detail & Related papers (2021-04-25T23:01:27Z)
Fader Networks for domain adaptation on fMRI: ABIDE-II study [68.5481471934606]
We use 3D convolutional autoencoders to build the domain irrelevant latent space image representation and demonstrate this method to outperform existing approaches on ABIDE data.
arXiv Detail & Related papers (2020-10-14T16:50:50Z)
Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients. We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks. Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z)
CovidDeep: SARS-CoV-2/COVID-19 Test Based on Wearable Medical Sensors and Efficient Neural Networks [51.589769497681175]
The novel coronavirus (SARS-CoV-2) has led to a pandemic. The current testing regime based on Reverse Transcription-Polymerase Chain Reaction for SARS-CoV-2 has been unable to keep up with testing demands. We propose a framework called CovidDeep that combines efficient DNNs with commercially available WMSs for pervasive testing of the virus.
arXiv Detail & Related papers (2020-07-20T21:47:28Z)
A Systematic Approach to Featurization for Cancer Drug Sensitivity Predictions with Deep Learning [49.86828302591469]
We train >35,000 neural network models, sweeping over common featurization techniques. We found the RNA-seq to be highly redundant and informative even with subsets larger than 128 features.
arXiv Detail & Related papers (2020-04-30T20:42:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.