Domain Adversarial Neural Networks for Dysarthric Speech Recognition
- URL: http://arxiv.org/abs/2010.03623v1
- Date: Wed, 7 Oct 2020 19:51:41 GMT
- Title: Domain Adversarial Neural Networks for Dysarthric Speech Recognition
- Authors: Dominika Woszczyk, Stavros Petridis, David Millard
- Abstract summary: This work explores domain adversarial neural networks (DANN) for speaker-independent speech recognition.
The classification task on 10 spoken digits is performed using an end-to-end CNN taking raw audio as input.
Experiments conducted in this paper show that DANN achieves an absolute recognition rate of 74.91% and outperforms the baseline by 12.18%.
- Score: 21.550420336634726
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Speech recognition systems have improved dramatically over the last few
years, however, their performance is significantly degraded for the cases of
accented or impaired speech. This work explores domain adversarial neural
networks (DANN) for speaker-independent speech recognition on the UAS dataset
of dysarthric speech. The classification task on 10 spoken digits is performed
using an end-to-end CNN taking raw audio as input. The results are compared to
a speaker-adaptive (SA) model as well as speaker-dependent (SD) and multi-task
learning models (MTL). The experiments conducted in this paper show that DANN
achieves an absolute recognition rate of 74.91% and outperforms the baseline by
12.18%. Additionally, the DANN model achieves comparable results to the SA
model's recognition rate of 77.65%. We also observe that when labelled
dysarthric speech data is available DANN and MTL perform similarly, but when
they are not DANN performs better than MTL.
Related papers
- Self-supervised ASR Models and Features For Dysarthric and Elderly Speech Recognition [71.87998918300806]
This paper explores approaches to integrate domain fine-tuned SSL pre-trained models and their features into TDNN and Conformer ASR systems.
TDNN systems constructed by integrating domain-adapted HuBERT, wav2vec2-conformer or multi-lingual XLSR models consistently outperform standalone fine-tuned SSL pre-trained models.
Consistent improvements in Alzheimer's Disease detection accuracy are also obtained using the DementiaBank Pitt elderly speech recognition outputs.
arXiv Detail & Related papers (2024-07-03T08:33:39Z) - Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast Conformer [59.57249127943914]
We present a multilingual Audio-Visual Speech Recognition model incorporating several enhancements to improve performance and audio noise robustness.
We increase the amount of audio-visual training data for six distinct languages, generating automatic transcriptions of unlabelled multilingual datasets.
Our proposed model achieves new state-of-the-art performance on the LRS3 dataset, reaching WER of 0.8%.
arXiv Detail & Related papers (2024-03-14T01:16:32Z) - Detecting Speech Abnormalities with a Perceiver-based Sequence
Classifier that Leverages a Universal Speech Model [4.503292461488901]
We propose a Perceiver-based sequence to detect abnormalities in speech reflective of several neurological disorders.
We combine this sequence with a Universal Speech Model (USM) that is trained (unsupervised) on 12 million hours of diverse audio recordings.
Our model outperforms standard transformer (80.9%) and perceiver (81.8%) models and achieves an average accuracy of 83.1%.
arXiv Detail & Related papers (2023-10-16T21:07:12Z) - Prediction of speech intelligibility with DNN-based performance measures [9.883633991083789]
This paper presents a speech intelligibility model based on automatic speech recognition (ASR)
It combines phoneme probabilities from deep neural networks (DNN) and a performance measure that estimates the word error rate from these probabilities.
The proposed model performs almost as well as the label-based model and produces more accurate predictions than the baseline models.
arXiv Detail & Related papers (2022-03-17T08:05:38Z) - Recent Progress in the CUHK Dysarthric Speech Recognition System [66.69024814159447]
Disordered speech presents a wide spectrum of challenges to current data intensive deep neural networks (DNNs) based automatic speech recognition technologies.
This paper presents recent research efforts at the Chinese University of Hong Kong to improve the performance of disordered speech recognition systems.
arXiv Detail & Related papers (2022-01-15T13:02:40Z) - Self-Supervised Learning for speech recognition with Intermediate layer
supervision [52.93758711230248]
We propose Intermediate Layer Supervision for Self-Supervised Learning (ILS-SSL)
ILS-SSL forces the model to concentrate on content information as much as possible by adding an additional SSL loss on the intermediate layers.
Experiments on LibriSpeech test-other set show that our method outperforms HuBERT significantly.
arXiv Detail & Related papers (2021-12-16T10:45:05Z) - Speech Emotion Recognition Using Quaternion Convolutional Neural
Networks [1.776746672434207]
This paper proposes a quaternion convolutional neural network (QCNN) based speech emotion recognition model.
Mel-spectrogram features of speech signals are encoded in an RGB quaternion domain.
The model achieves an accuracy of 77.87%, 70.46%, and 88.78% for the RAVDESS, IEMOCAP, and EMO-DB datasets, respectively.
arXiv Detail & Related papers (2021-10-31T04:06:07Z) - Arabic Speech Recognition by End-to-End, Modular Systems and Human [56.96327247226586]
We perform a comprehensive benchmarking for end-to-end transformer ASR, modular HMM-DNN ASR, and human speech recognition.
For ASR the end-to-end work led to 12.5%, 27.5%, 23.8% WER; a new performance milestone for the MGB2, MGB3, and MGB5 challenges respectively.
Our results suggest that human performance in the Arabic language is still considerably better than the machine with an absolute WER gap of 3.6% on average.
arXiv Detail & Related papers (2021-01-21T05:55:29Z) - Characterizing Speech Adversarial Examples Using Self-Attention U-Net
Enhancement [102.48582597586233]
We present a U-Net based attention model, U-Net$_At$, to enhance adversarial speech signals.
We conduct experiments on the automatic speech recognition (ASR) task with adversarial audio attacks.
arXiv Detail & Related papers (2020-03-31T02:16:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.