Speech Recognition-based Feature Extraction for Enhanced Automatic Severity Classification in Dysarthric Speech
- URL: http://arxiv.org/abs/2412.03784v1
- Date: Thu, 05 Dec 2024 00:12:53 GMT
- Title: Speech Recognition-based Feature Extraction for Enhanced Automatic Severity Classification in Dysarthric Speech
- Authors: Yerin Choi, Jeehyun Lee, Myoung-Wan Koo,
- Abstract summary: We introduce an ASR transcription as a novel feature extraction source.
We finetune the ASR model for dysarthric speech, then use this model to transcribe dysarthric speech and extract word segment boundary information.
These features demonstrated an improved severity prediction performance to existing features: balanced accuracy of 83.72%.
- Score: 2.7309692684728617
- License:
- Abstract: Due to the subjective nature of current clinical evaluation, the need for automatic severity evaluation in dysarthric speech has emerged. DNN models outperform ML models but lack user-friendly explainability. ML models offer explainable results at a feature level, but their performance is comparatively lower. Current ML models extract various features from raw waveforms to predict severity. However, existing methods do not encompass all dysarthric features used in clinical evaluation. To address this gap, we propose a feature extraction method that minimizes information loss. We introduce an ASR transcription as a novel feature extraction source. We finetune the ASR model for dysarthric speech, then use this model to transcribe dysarthric speech and extract word segment boundary information. It enables capturing finer pronunciation and broader prosodic features. These features demonstrated an improved severity prediction performance to existing features: balanced accuracy of 83.72%.
Related papers
- Re-Visiting Explainable AI Evaluation Metrics to Identify The Most Informative Features [0.0]
Functionality or proxy-based approach is one of the used approaches to evaluate the quality of artificial intelligence methods.
Among them, Selectivity or RemOve And Retrain (ROAR), and Permutation Importance (PI) are the most commonly used metrics.
We propose expected accuracy interval (EAI) to predict the upper and lower bounds of the the accuracy of the model when ROAR or IP is implemented.
arXiv Detail & Related papers (2025-01-31T17:18:43Z) - Devising a Set of Compact and Explainable Spoken Language Feature for Screening Alzheimer's Disease [52.46922921214341]
Alzheimer's disease (AD) has become one of the most significant health challenges in an aging society.
We devised an explainable and effective feature set that leverages the visual capabilities of a large language model (LLM) and the Term Frequency-Inverse Document Frequency (TF-IDF) model.
Our new features can be well explained and interpreted step by step which enhance the interpretability of automatic AD screening.
arXiv Detail & Related papers (2024-11-28T05:23:22Z) - Language Models and Retrieval Augmented Generation for Automated Structured Data Extraction from Diagnostic Reports [2.932283627137903]
The study utilized two datasets: 7,294 radiology reports annotated for Brain Tumor Reporting and Data System (BT-RADS) scores and 2,154 pathology reports for isocitrate dehydrogenase (IDH) mutation status.
arXiv Detail & Related papers (2024-09-15T15:21:45Z) - Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images [68.42215385041114]
This paper introduces a novel lightweight multi-level adaptation and comparison framework to repurpose the CLIP model for medical anomaly detection.
Our approach integrates multiple residual adapters into the pre-trained visual encoder, enabling a stepwise enhancement of visual features across different levels.
Our experiments on medical anomaly detection benchmarks demonstrate that our method significantly surpasses current state-of-the-art models.
arXiv Detail & Related papers (2024-03-19T09:28:19Z) - Deep Prototypical-Parts Ease Morphological Kidney Stone Identification
and are Competitively Robust to Photometric Perturbations [0.9236074230806579]
We learn Prototypical Parts (PPs) per kidney stone subtype to generate an output classification.
Our implementation's average accuracy is lower than state-of-the-art (SOTA) non-interpretable DL models by 1.5 %.
Our models perform 2.8% better on perturbed images with a lower standard deviation, without adversarial training.
arXiv Detail & Related papers (2023-04-08T17:43:31Z) - Semantic Latent Space Regression of Diffusion Autoencoders for Vertebral
Fracture Grading [72.45699658852304]
This paper proposes a novel approach to train a generative Diffusion Autoencoder model as an unsupervised feature extractor.
We model fracture grading as a continuous regression, which is more reflective of the smooth progression of fractures.
Importantly, the generative nature of our method allows us to visualize different grades of a given vertebra, providing interpretability and insight into the features that contribute to automated grading.
arXiv Detail & Related papers (2023-03-21T17:16:01Z) - cRedAnno+: Annotation Exploitation in Self-Explanatory Lung Nodule
Diagnosis [8.582182186207671]
cRedAnno achieves competitive performance with considerably reduced annotation needs.
We propose an annotation exploitation mechanism by conducting semi-supervised active learning.
The proposed approach achieves comparable or even higher malignancy prediction accuracy with 10x fewer annotations.
arXiv Detail & Related papers (2022-10-28T12:44:31Z) - MACE: An Efficient Model-Agnostic Framework for Counterfactual
Explanation [132.77005365032468]
We propose a novel framework of Model-Agnostic Counterfactual Explanation (MACE)
In our MACE approach, we propose a novel RL-based method for finding good counterfactual examples and a gradient-less descent method for improving proximity.
Experiments on public datasets validate the effectiveness with better validity, sparsity and proximity.
arXiv Detail & Related papers (2022-05-31T04:57:06Z) - Identification of Autism spectrum disorder based on a novel feature
selection method and Variational Autoencoder [7.0876609220947655]
Noninvasive brain imaging such as resting-state functional magnetic resonance imaging (rs-fMRI) provides a promising solution for the early diagnosis of Autism spectrum disorder (ASD)
This paper introduces a classification framework to aid ASD diagnosis based on rs-fMRI.
arXiv Detail & Related papers (2022-04-07T08:50:48Z) - Can contrastive learning avoid shortcut solutions? [88.249082564465]
implicit feature modification (IFM) is a method for altering positive and negative samples in order to guide contrastive models towards capturing a wider variety of predictive features.
IFM reduces feature suppression, and as a result improves performance on vision and medical imaging tasks.
arXiv Detail & Related papers (2021-06-21T16:22:43Z) - Removing Spurious Features can Hurt Accuracy and Affect Groups
Disproportionately [83.68135652247496]
A natural remedy is to remove spurious features from the model.
We show that removal of spurious features can decrease accuracy due to inductive biases.
We also show that robust self-training can remove spurious features without affecting the overall accuracy.
arXiv Detail & Related papers (2020-12-07T23:08:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.