Wav2vec-based Detection and Severity Level Classification of Dysarthria
from Speech
- URL: http://arxiv.org/abs/2309.14107v2
- Date: Tue, 17 Oct 2023 13:38:27 GMT
- Title: Wav2vec-based Detection and Severity Level Classification of Dysarthria
from Speech
- Authors: Farhad Javanmardi, Saska Tirronen, Manila Kodali, Sudarsana Reddy
Kadiri, Paavo Alku
- Abstract summary: The pre-trained wav2vec 2.0 model is studied as a feature extractor to build detection and severity level classification systems.
Experiments were carried out with the popularly used UA-speech database.
- Score: 15.150153248025543
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automatic detection and severity level classification of dysarthria directly
from acoustic speech signals can be used as a tool in medical diagnosis. In
this work, the pre-trained wav2vec 2.0 model is studied as a feature extractor
to build detection and severity level classification systems for dysarthric
speech. The experiments were carried out with the popularly used UA-speech
database. In the detection experiments, the results revealed that the best
performance was obtained using the embeddings from the first layer of the
wav2vec model that yielded an absolute improvement of 1.23% in accuracy
compared to the best performing baseline feature (spectrogram). In the studied
severity level classification task, the results revealed that the embeddings
from the final layer gave an absolute improvement of 10.62% in accuracy
compared to the best baseline features (mel-frequency cepstral coefficients).
Related papers
- Interpretable Temporal Class Activation Representation for Audio Spoofing Detection [7.476305130252989]
We utilize the wav2vec 2.0 model and attentive utterance-level features to integrate interpretability directly into the model's architecture.
Our model achieves state-of-the-art results, with an EER of 0.51% and a min t-DCF of 0.0165 on the ASVspoof 2019-LA set.
arXiv Detail & Related papers (2024-06-13T05:36:01Z) - A Few-Shot Approach to Dysarthric Speech Intelligibility Level
Classification Using Transformers [0.0]
Dysarthria is a speech disorder that hinders communication due to difficulties in articulating words.
Much of the literature focused on improving ASR systems for dysarthric speech.
This work aims to develop models that can accurately classify the presence of dysarthria.
arXiv Detail & Related papers (2023-09-17T17:23:41Z) - Self-Supervised Pretraining Improves Performance and Inference
Efficiency in Multiple Lung Ultrasound Interpretation Tasks [65.23740556896654]
We investigated whether self-supervised pretraining could produce a neural network feature extractor applicable to multiple classification tasks in lung ultrasound analysis.
When fine-tuning on three lung ultrasound tasks, pretrained models resulted in an improvement of the average across-task area under the receiver operating curve (AUC) by 0.032 and 0.061 on local and external test sets respectively.
arXiv Detail & Related papers (2023-09-05T21:36:42Z) - Investigation of Self-supervised Pre-trained Models for Classification
of Voice Quality from Speech and Neck Surface Accelerometer Signals [27.398425786898223]
This study examines simultaneously-recorded speech and NSA signals in the classification of voice quality.
The effectiveness of pre-trained models is compared in feature extraction between glottal source waveforms and raw signal waveforms for both speech and NSA inputs.
arXiv Detail & Related papers (2023-08-06T23:16:54Z) - Low-complexity deep learning frameworks for acoustic scene
classification [64.22762153453175]
We present low-complexity deep learning frameworks for acoustic scene classification (ASC)
The proposed frameworks can be separated into four main steps: Front-end spectrogram extraction, online data augmentation, back-end classification, and late fusion of predicted probabilities.
Our experiments conducted on DCASE 2022 Task 1 Development dataset have fullfiled the requirement of low-complexity and achieved the best classification accuracy of 60.1%.
arXiv Detail & Related papers (2022-06-13T11:41:39Z) - Anomalous Sound Detection Using a Binary Classification Model and Class
Centroids [47.856367556856554]
We propose a binary classification model that is developed by using not only normal data but also outlier data in the other domains as pseudo-anomalous sound data.
We also investigate the effectiveness of additionally using anomalous sound data for further improving the binary classification model.
arXiv Detail & Related papers (2021-06-11T03:35:06Z) - Anomaly Detection in Cybersecurity: Unsupervised, Graph-Based and
Supervised Learning Methods in Adversarial Environments [63.942632088208505]
Inherent to today's operating environment is the practice of adversarial machine learning.
In this work, we examine the feasibility of unsupervised learning and graph-based methods for anomaly detection.
We incorporate a realistic adversarial training mechanism when training our supervised models to enable strong classification performance in adversarial environments.
arXiv Detail & Related papers (2021-05-14T10:05:10Z) - Audio feature ranking for sound-based COVID-19 patient detection [1.7188280334580195]
COVID-19 has emerged as a low-cost, non-invasive, and accessible audio classification method.
No application has been approved for official use due to the stringent reliability and accuracy requirements of the critical healthcare setting.
We performed an investigation and ranking of 15 audio features, including less well-known ones.
The results were verified on two independent COVID-19 sound datasets.
arXiv Detail & Related papers (2021-04-14T21:06:20Z) - Improving Medical Image Classification with Label Noise Using
Dual-uncertainty Estimation [72.0276067144762]
We discuss and define the two common types of label noise in medical images.
We propose an uncertainty estimation-based framework to handle these two label noise amid the medical image classification task.
arXiv Detail & Related papers (2021-02-28T14:56:45Z) - Capturing scattered discriminative information using a deep architecture
in acoustic scene classification [49.86640645460706]
In this study, we investigate various methods to capture discriminative information and simultaneously mitigate the overfitting problem.
We adopt a max feature map method to replace conventional non-linear activations in a deep neural network.
Two data augment methods and two deep architecture modules are further explored to reduce overfitting and sustain the system's discriminative power.
arXiv Detail & Related papers (2020-07-09T08:32:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.