Core-set Selection Using Metrics-based Explanations (CSUME) for
multiclass ECG
- URL: http://arxiv.org/abs/2205.14508v1
- Date: Sat, 28 May 2022 19:36:28 GMT
- Title: Core-set Selection Using Metrics-based Explanations (CSUME) for
multiclass ECG
- Authors: Sagnik Dakshit, Barbara Mukami Maweu, Sristi Dakshit, Balakrishnan
Prabhakaran
- Abstract summary: We show how a selection of good quality data improves deep learning model performance.
Our experimental results show a 9.67% and 8.69% precision and recall improvement with a significant training data volume reduction of 50%.
- Score: 2.0520503083305073
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The adoption of deep learning-based healthcare decision support systems such
as the detection of irregular cardiac rhythm is hindered by challenges such as
lack of access to quality data and the high costs associated with the
collection and annotation of data. The collection and processing of large
volumes of healthcare data is a continuous process. The performance of
data-hungry Deep Learning models (DL) is highly dependent on the quantity and
quality of the data. While the need for data quantity has been established
through research adequately, we show how a selection of good quality data
improves deep learning model performance. In this work, we take
Electrocardiogram (ECG) data as a case study and propose a model performance
improvement methodology for algorithm developers, that selects the most
informative data samples from incoming streams of multi-class ECG data. Our
Core-Set selection methodology uses metrics-based explanations to select the
most informative ECG data samples. This also provides an understanding (for
algorithm developers) as to why a sample was selected as more informative over
others for the improvement of deep learning model performance. Our experimental
results show a 9.67% and 8.69% precision and recall improvement with a
significant training data volume reduction of 50%. Additionally, our proposed
methodology asserts the quality and annotation of ECG samples from incoming
data streams. It allows automatic detection of individual data samples that do
not contribute to model learning thus minimizing possible negative effects on
model performance. We further discuss the potential generalizability of our
approach by experimenting with a different dataset and deep learning
architecture.
Related papers
- A CLIP-Powered Framework for Robust and Generalizable Data Selection [51.46695086779598]
Real-world datasets often contain redundant and noisy data, imposing a negative impact on training efficiency and model performance.
Data selection has shown promise in identifying the most representative samples from the entire dataset.
We propose a novel CLIP-powered data selection framework that leverages multimodal information for more robust and generalizable sample selection.
arXiv Detail & Related papers (2024-10-15T03:00:58Z) - JSCDS: A Core Data Selection Method with Jason-Shannon Divergence for Caries RGB Images-Efficient Learning [2.508255511130695]
The performance of deep learning models depends on high-quality data and requires substantial training resources.
We propose a Core Data Selection Method with Jensen-Shannon Divergence (JSCDS) for efficient caries image learning and caries classification.
JSCDS outperforms other data selection methods in prediction performance and time consumption.
arXiv Detail & Related papers (2024-06-29T08:19:25Z) - Self-Trained Model for ECG Complex Delineation [0.0]
Electrocardiogram (ECG) delineation plays a crucial role in assisting cardiologists with accurate diagnoses.
We introduce a dataset for ECG delineation and propose a novel self-trained method aimed at leveraging a vast amount of unlabeled ECG data.
Our approach involves the pseudolabeling of unlabeled data using a neural network trained on our dataset. Subsequently, we train the model on the newly labeled samples to enhance the quality of delineation.
arXiv Detail & Related papers (2024-06-04T18:54:10Z) - LESS: Selecting Influential Data for Targeted Instruction Tuning [64.78894228923619]
We propose LESS, an efficient algorithm to estimate data influences and perform Low-rank gradiEnt Similarity Search for instruction data selection.
We show that training on a LESS-selected 5% of the data can often outperform training on the full dataset across diverse downstream tasks.
Our method goes beyond surface form cues to identify data that the necessary reasoning skills for the intended downstream application.
arXiv Detail & Related papers (2024-02-06T19:18:04Z) - Which Augmentation Should I Use? An Empirical Investigation of Augmentations for Self-Supervised Phonocardiogram Representation Learning [5.438725298163702]
Contrastive Self-Supervised Learning (SSL) offers a potential solution to labeled data scarcity.
We propose uncovering the optimal augmentations for applying contrastive learning in 1D phonocardiogram (PCG) classification.
We demonstrate that depending on its training distribution, the effectiveness of a fully-supervised model can degrade up to 32%, while SSL models only lose up to 10% or even improve in some cases.
arXiv Detail & Related papers (2023-12-01T11:06:00Z) - Temporal Output Discrepancy for Loss Estimation-based Active Learning [65.93767110342502]
We present a novel deep active learning approach that queries the oracle for data annotation when the unlabeled sample is believed to incorporate high loss.
Our approach achieves superior performances than the state-of-the-art active learning methods on image classification and semantic segmentation tasks.
arXiv Detail & Related papers (2022-12-20T19:29:37Z) - Learning brain MRI quality control: a multi-factorial generalization
problem [0.0]
This work aimed at evaluating the performances of the MRIQC pipeline on various large-scale datasets.
We focused our analysis on the MRIQC preprocessing steps and tested the pipeline with and without them.
We concluded that a model trained with data from a heterogeneous population, such as the CATI dataset, provides the best scores on unseen data.
arXiv Detail & Related papers (2022-05-31T15:46:44Z) - Improving the efficacy of Deep Learning models for Heart Beat detection
on heterogeneous datasets [0.0]
We investigate the issues related to applying a Deep Learning model on heterogeneous datasets.
We show that the performance of a model trained on data from healthy subjects decreases when applied to patients with cardiac conditions.
We then evaluate the use of Transfer Learning to adapt the model to the different datasets.
arXiv Detail & Related papers (2021-10-26T14:26:55Z) - Efficient training of lightweight neural networks using Online
Self-Acquired Knowledge Distillation [51.66271681532262]
Online Self-Acquired Knowledge Distillation (OSAKD) is proposed, aiming to improve the performance of any deep neural model in an online manner.
We utilize k-nn non-parametric density estimation technique for estimating the unknown probability distributions of the data samples in the output feature space.
arXiv Detail & Related papers (2021-08-26T14:01:04Z) - How Training Data Impacts Performance in Learning-based Control [67.7875109298865]
This paper derives an analytical relationship between the density of the training data and the control performance.
We formulate a quality measure for the data set, which we refer to as $rho$-gap.
We show how the $rho$-gap can be applied to a feedback linearizing control law.
arXiv Detail & Related papers (2020-05-25T12:13:49Z) - ECG-DelNet: Delineation of Ambulatory Electrocardiograms with Mixed
Quality Labeling Using Neural Networks [69.25956542388653]
Deep learning (DL) algorithms are gaining weight in academic and industrial settings.
We demonstrate DL can be successfully applied to low interpretative tasks by embedding ECG detection and delineation onto a segmentation framework.
The model was trained using PhysioNet's QT database, comprised of 105 ambulatory ECG recordings.
arXiv Detail & Related papers (2020-05-11T16:29:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.