Improved Techniques for the Conditional Generative Augmentation of
Clinical Audio Data
- URL: http://arxiv.org/abs/2211.02874v1
- Date: Sat, 5 Nov 2022 10:58:04 GMT
- Title: Improved Techniques for the Conditional Generative Augmentation of
Clinical Audio Data
- Authors: Mane Margaryan, Matthias Seibold, Indu Joshi, Mazda Farshad, Philipp
F\"urnstahl, Nassir Navab
- Abstract summary: We propose a conditional generative adversarial neural network-based augmentation method which is able to synthesize mel spectrograms from a learned data distribution.
We show that our method outperforms all classical audio augmentation techniques and previously published generative methods in terms of generated sample quality.
The proposed model advances the state-of-the-art in the augmentation of clinical audio data and improves the data bottleneck for the design of clinical acoustic sensing systems.
- Score: 36.45569352490318
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Data augmentation is a valuable tool for the design of deep learning systems
to overcome data limitations and stabilize the training process. Especially in
the medical domain, where the collection of large-scale data sets is
challenging and expensive due to limited access to patient data, relevant
environments, as well as strict regulations, community-curated large-scale
public datasets, pretrained models, and advanced data augmentation methods are
the main factors for developing reliable systems to improve patient care.
However, for the development of medical acoustic sensing systems, an emerging
field of research, the community lacks large-scale publicly available data sets
and pretrained models. To address the problem of limited data, we propose a
conditional generative adversarial neural network-based augmentation method
which is able to synthesize mel spectrograms from a learned data distribution
of a source data set. In contrast to previously proposed fully convolutional
models, the proposed model implements residual Squeeze and Excitation modules
in the generator architecture. We show that our method outperforms all
classical audio augmentation techniques and previously published generative
methods in terms of generated sample quality and a performance improvement of
2.84% of Macro F1-Score for a classifier trained on the augmented data set, an
enhancement of $1.14\%$ in relation to previous work. By analyzing the
correlation of intermediate feature spaces, we show that the residual Squeeze
and Excitation modules help the model to reduce redundancy in the latent
features. Therefore, the proposed model advances the state-of-the-art in the
augmentation of clinical audio data and improves the data bottleneck for the
design of clinical acoustic sensing systems.
Related papers
- Multi-OCT-SelfNet: Integrating Self-Supervised Learning with Multi-Source Data Fusion for Enhanced Multi-Class Retinal Disease Classification [2.5091334993691206]
Development of a robust deep-learning model for retinal disease diagnosis requires a substantial dataset for training.
The capacity to generalize effectively on smaller datasets remains a persistent challenge.
We've combined a wide range of data sources to improve performance and generalization to new data.
arXiv Detail & Related papers (2024-09-17T17:22:35Z) - Synthesizing Multimodal Electronic Health Records via Predictive Diffusion Models [69.06149482021071]
We propose a novel EHR data generation model called EHRPD.
It is a diffusion-based model designed to predict the next visit based on the current one while also incorporating time interval estimation.
We conduct experiments on two public datasets and evaluate EHRPD from fidelity, privacy, and utility perspectives.
arXiv Detail & Related papers (2024-06-20T02:20:23Z) - Improving SMOTE via Fusing Conditional VAE for Data-adaptive Noise Filtering [0.5735035463793009]
We introduce a framework to enhance the SMOTE algorithm using Variational Autoencoders (VAE)
Our approach systematically quantifies the density of data points in a low-dimensional latent space using the VAE, simultaneously incorporating information on class labels and classification difficulty.
Empirical studies on several imbalanced datasets represent that this simple process innovatively improves the conventional SMOTE algorithm over the deep learning models.
arXiv Detail & Related papers (2024-05-30T07:06:02Z) - MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - Learnable Weight Initialization for Volumetric Medical Image Segmentation [66.3030435676252]
We propose a learnable weight-based hybrid medical image segmentation approach.
Our approach is easy to integrate into any hybrid model and requires no external training data.
Experiments on multi-organ and lung cancer segmentation tasks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-06-15T17:55:05Z) - Application of federated learning techniques for arrhythmia
classification using 12-lead ECG signals [0.11184789007828977]
This work uses a Federated Learning (FL) privacy-preserving methodology to train AI models over heterogeneous sets of high-definition ECG.
We demonstrated comparable performance to models trained using CL, IID, and non-IID approaches.
arXiv Detail & Related papers (2022-08-23T14:21:16Z) - Ultrasound Signal Processing: From Models to Deep Learning [64.56774869055826]
Medical ultrasound imaging relies heavily on high-quality signal processing to provide reliable and interpretable image reconstructions.
Deep learning based methods, which are optimized in a data-driven fashion, have gained popularity.
A relatively new paradigm combines the power of the two: leveraging data-driven deep learning, as well as exploiting domain knowledge.
arXiv Detail & Related papers (2022-04-09T13:04:36Z) - Conditional Generative Data Augmentation for Clinical Audio Datasets [36.45569352490318]
We propose a novel data augmentation method for clinical audio datasets based on a conditional Wasserstein Generative Adversarial Network with Gradient Penalty.
To validate our method, we created a clinical audio dataset which was recorded in a real-world operating room during Total Hipplasty (THA) procedures.
We show that training with the generated augmented samples outperforms classical audio augmentation methods in terms of classification accuracy.
arXiv Detail & Related papers (2022-03-22T09:47:31Z) - The Imaginative Generative Adversarial Network: Automatic Data
Augmentation for Dynamic Skeleton-Based Hand Gesture and Human Action
Recognition [27.795763107984286]
We present a novel automatic data augmentation model, which approximates the distribution of the input data and samples new data from this distribution.
Our results show that the augmentation strategy is fast to train and can improve classification accuracy for both neural networks and state-of-the-art methods.
arXiv Detail & Related papers (2021-05-27T11:07:09Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z) - Generative Data Augmentation for Commonsense Reasoning [75.26876609249197]
G-DAUGC is a novel generative data augmentation method that aims to achieve more accurate and robust learning in the low-resource setting.
G-DAUGC consistently outperforms existing data augmentation methods based on back-translation.
Our analysis demonstrates that G-DAUGC produces a diverse set of fluent training examples, and that its selection and training approaches are important for performance.
arXiv Detail & Related papers (2020-04-24T06:12:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.