Continuous Speech for Improved Learning Pathological Voice Disorders
- URL: http://arxiv.org/abs/2202.10777v1
- Date: Tue, 22 Feb 2022 09:58:31 GMT
- Title: Continuous Speech for Improved Learning Pathological Voice Disorders
- Authors: Syu-Siang Wang, Chi-Te Wang, Chih-Chung Lai, Yu Tsao, Shih-Hau Fang
- Abstract summary: This study proposes a novel approach, using continuous Mandarin speech instead of a single vowel, to classify four common voice disorders.
In the proposed framework, acoustic signals are transformed into mel-frequency cepstral coefficients, and a bi-directional long-short term memory network (BiLSTM) is adopted to model the sequential features.
- Score: 12.867900671251395
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Goal: Numerous studies had successfully differentiated normal and abnormal
voice samples. Nevertheless, further classification had rarely been attempted.
This study proposes a novel approach, using continuous Mandarin speech instead
of a single vowel, to classify four common voice disorders (i.e. functional
dysphonia, neoplasm, phonotrauma, and vocal palsy). Methods: In the proposed
framework, acoustic signals are transformed into mel-frequency cepstral
coefficients, and a bi-directional long-short term memory network (BiLSTM) is
adopted to model the sequential features. The experiments were conducted on a
large-scale database, wherein 1,045 continuous speech were collected by the
speech clinic of a hospital from 2012 to 2019. Results: Experimental results
demonstrated that the proposed framework yields significant accuracy and
unweighted average recall improvements of 78.12-89.27% and 50.92-80.68%,
respectively, compared with systems that use a single vowel. Conclusions: The
results are consistent with other machine learning algorithms, including gated
recurrent units, random forest, deep neural networks, and LSTM. The
sensitivities for each disorder were also analyzed, and the model capabilities
were visualized via principal component analysis. An alternative experiment
based on a balanced dataset again confirms the advantages of using continuous
speech for learning voice disorders.
Related papers
- Developing vocal system impaired patient-aimed voice quality assessment approach using ASR representation-included multiple features [0.4681310436826459]
This article showcases the utilization of automatic speech recognition and self-supervised learning representations, pre-trained on extensive datasets of normal speech.
Experiments involve checks on PVQD dataset, covering various causes of vocal system damage in English, and a Japanese dataset focusing on patients with Parkinson's disease.
The results on PVQD reveal a notable correlation (>0.8 on PCC) and an extraordinary accuracy (0.5 on MSE) in predicting Grade, Breathy, and Asthenic indicators.
arXiv Detail & Related papers (2024-08-22T10:22:53Z) - Show from Tell: Audio-Visual Modelling in Clinical Settings [58.88175583465277]
We consider audio-visual modelling in a clinical setting, providing a solution to learn medical representations without human expert annotation.
A simple yet effective multi-modal self-supervised learning framework is proposed for this purpose.
The proposed approach is able to localise anatomical regions of interest during ultrasound imaging, with only speech audio as a reference.
arXiv Detail & Related papers (2023-10-25T08:55:48Z) - Detecting Speech Abnormalities with a Perceiver-based Sequence
Classifier that Leverages a Universal Speech Model [4.503292461488901]
We propose a Perceiver-based sequence to detect abnormalities in speech reflective of several neurological disorders.
We combine this sequence with a Universal Speech Model (USM) that is trained (unsupervised) on 12 million hours of diverse audio recordings.
Our model outperforms standard transformer (80.9%) and perceiver (81.8%) models and achieves an average accuracy of 83.1%.
arXiv Detail & Related papers (2023-10-16T21:07:12Z) - Automatically measuring speech fluency in people with aphasia: first
achievements using read-speech data [55.84746218227712]
This study aims at assessing the relevance of a signalprocessingalgorithm, initially developed in the field of language acquisition, for the automatic measurement of speech fluency.
arXiv Detail & Related papers (2023-08-09T07:51:40Z) - Exploiting Cross-domain And Cross-Lingual Ultrasound Tongue Imaging
Features For Elderly And Dysarthric Speech Recognition [55.25565305101314]
Articulatory features are invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition systems.
This paper presents a cross-domain and cross-lingual A2A inversion approach that utilizes the parallel audio and ultrasound tongue imaging (UTI) data of the 24-hour TaL corpus in A2A model pre-training.
Experiments conducted on three tasks suggested incorporating the generated articulatory features consistently outperformed the baseline TDNN and Conformer ASR systems.
arXiv Detail & Related papers (2022-06-15T07:20:28Z) - Audio-visual multi-channel speech separation, dereverberation and
recognition [70.34433820322323]
This paper proposes an audio-visual multi-channel speech separation, dereverberation and recognition approach.
The advantage of the additional visual modality over using audio only is demonstrated on two neural dereverberation approaches.
Experiments conducted on the LRS2 dataset suggest that the proposed audio-visual multi-channel speech separation, dereverberation and recognition system outperforms the baseline.
arXiv Detail & Related papers (2022-04-05T04:16:03Z) - Exploiting Cross Domain Acoustic-to-articulatory Inverted Features For
Disordered Speech Recognition [57.15942628305797]
Articulatory features are invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition systems for normal speech.
This paper presents a cross-domain acoustic-to-articulatory (A2A) inversion approach that utilizes the parallel acoustic-articulatory data of the 15-hour TORGO corpus in model training.
Cross-domain adapted to the 102.7-hour UASpeech corpus and to produce articulatory features.
arXiv Detail & Related papers (2022-03-19T08:47:18Z) - Investigation of Data Augmentation Techniques for Disordered Speech
Recognition [69.50670302435174]
This paper investigates a set of data augmentation techniques for disordered speech recognition.
Both normal and disordered speech were exploited in the augmentation process.
The final speaker adapted system constructed using the UASpeech corpus and the best augmentation approach based on speed perturbation produced up to 2.92% absolute word error rate (WER)
arXiv Detail & Related papers (2022-01-14T17:09:22Z) - Assessing clinical utility of Machine Learning and Artificial
Intelligence approaches to analyze speech recordings in Multiple Sclerosis: A
Pilot Study [1.6582693134062305]
The aim of this study was to determine the potential clinical utility of machine learning and deep learning/AI approaches for the aiding of diagnosis, biomarker extraction and progression monitoring of multiple sclerosis using speech recordings.
The Random Forest model performed best, achieving an Accuracy of 0.82 on the validation dataset and an area-under-curve of 0.76 across 5 k-fold cycles on the training dataset.
arXiv Detail & Related papers (2021-09-20T21:02:37Z) - Multi-Modal Detection of Alzheimer's Disease from Speech and Text [3.702631194466718]
We propose a deep learning method that utilizes speech and the corresponding transcript simultaneously to detect Alzheimer's disease (AD)
The proposed method achieves 85.3% 10-fold cross-validation accuracy when trained and evaluated on the Dementiabank Pitt corpus.
arXiv Detail & Related papers (2020-11-30T21:18:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.