EasyCall corpus: a dysarthric speech dataset
- URL: http://arxiv.org/abs/2104.02542v1
- Date: Tue, 6 Apr 2021 14:32:47 GMT
- Title: EasyCall corpus: a dysarthric speech dataset
- Authors: Rosanna Turrisi, Arianna Braccia, Marco Emanuele, Simone Giulietti,
Maura Pugliatti, Mariachiara Sensi, Luciano Fadiga, Leonardo Badino
- Abstract summary: This paper introduces a new dysarthric speech command dataset in Italian, called EasyCall corpus.
The dataset consists of 21386 audio recordings from 24 healthy and 31 dysarthric speakers, whose individual degree of speech impairment was assessed by neurologists.
The corpus aims at providing a resource for the development of ASR-based assistive technologies for patients with dysarthria.
- Score: 4.6760299097922715
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper introduces a new dysarthric speech command dataset in Italian,
called EasyCall corpus. The dataset consists of 21386 audio recordings from 24
healthy and 31 dysarthric speakers, whose individual degree of speech
impairment was assessed by neurologists through the Therapy Outcome Measure.
The corpus aims at providing a resource for the development of ASR-based
assistive technologies for patients with dysarthria. In particular, it may be
exploited to develop a voice-controlled contact application for commercial
smartphones, aiming at improving dysarthric patients' ability to communicate
with their family and caregivers. Before recording the dataset, participants
were administered a survey to evaluate which commands are more likely to be
employed by dysarthric individuals in a voice-controlled contact application.
In addition, the dataset includes a list of non-commands (i.e., words
near/inside commands or phonetically close to commands) that can be leveraged
to build a more robust command recognition system. At present commercial ASR
systems perform poorly on the EasyCall Corpus as we report in this paper. This
result corroborates the need for dysarthric speech corpora for developing
effective assistive technologies. To the best of our knowledge, this database
represents the richest corpus of dysarthric speech to date.
Related papers
- Enhancing Voice Wake-Up for Dysarthria: Mandarin Dysarthria Speech Corpus Release and Customized System Design [58.50329724298128]
This paper addresses the wake-up word spotting (WWS) task for dysarthric individuals, aiming to integrate them into real-world applications.
We release the open-source Mandarin Dysarthria Speech Corpus (MDSC), a dataset designed for dysarthric individuals in home environments.
We also develop a customized dysarthria WWS system that showcases robustness in handling intelligibility and achieving exceptional performance.
arXiv Detail & Related papers (2024-06-14T03:06:55Z) - Voice EHR: Introducing Multimodal Audio Data for Health [3.876405146656873]
This report introduces a novel data type and a corresponding collection system that captures health data through guided questions using only a mobile/web application.
This application ultimately results in an audio electronic health record (voice EHR) which may contain complex biomarkers of health from conventional voice/respiratory features, speech patterns, and language with semantic meaning.
arXiv Detail & Related papers (2024-04-02T04:07:22Z) - UNIT-DSR: Dysarthric Speech Reconstruction System Using Speech Unit
Normalization [60.43992089087448]
Dysarthric speech reconstruction systems aim to automatically convert dysarthric speech into normal-sounding speech.
We propose a Unit-DSR system, which harnesses the powerful domain-adaptation capacity of HuBERT for training efficiency improvement.
Compared with NED approaches, the Unit-DSR system only consists of a speech unit normalizer and a Unit HiFi-GAN vocoder, which is considerably simpler without cascaded sub-modules or auxiliary tasks.
arXiv Detail & Related papers (2024-01-26T06:08:47Z) - PoCaP Corpus: A Multimodal Dataset for Smart Operating Room Speech
Assistant using Interventional Radiology Workflow Analysis [7.189635716814341]
This paper presents a new multimodal interventional radiology dataset, called PoCaP (Port Catheter Placement) Corpus.
This corpus consists of speech and audio signals in German, X-ray images, and system commands collected from 31 PoCaP interventions by six surgeons.
arXiv Detail & Related papers (2022-06-24T14:39:11Z) - Conformer Based Elderly Speech Recognition System for Alzheimer's
Disease Detection [62.23830810096617]
Early diagnosis of Alzheimer's disease (AD) is crucial in facilitating preventive care to delay further progression.
This paper presents the development of a state-of-the-art Conformer based speech recognition system built on the DementiaBank Pitt corpus for automatic AD detection.
arXiv Detail & Related papers (2022-06-23T12:50:55Z) - Exploiting Cross-domain And Cross-Lingual Ultrasound Tongue Imaging
Features For Elderly And Dysarthric Speech Recognition [55.25565305101314]
Articulatory features are invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition systems.
This paper presents a cross-domain and cross-lingual A2A inversion approach that utilizes the parallel audio and ultrasound tongue imaging (UTI) data of the 24-hour TaL corpus in A2A model pre-training.
Experiments conducted on three tasks suggested incorporating the generated articulatory features consistently outperformed the baseline TDNN and Conformer ASR systems.
arXiv Detail & Related papers (2022-06-15T07:20:28Z) - Personalized Adversarial Data Augmentation for Dysarthric and Elderly
Speech Recognition [30.885165674448352]
This paper presents a novel set of speaker dependent (GAN) based data augmentation approaches for elderly and dysarthric speech recognition.
GAN based data augmentation approaches consistently outperform the baseline speed perturbation method by up to 0.91% and 3.0% absolute.
Consistent performance improvements are retained after applying LHUC based speaker adaptation.
arXiv Detail & Related papers (2022-05-13T04:29:49Z) - PriMock57: A Dataset Of Primary Care Mock Consultations [66.29154510369372]
We detail the development of a public access, high quality dataset comprising of57 mocked primary care consultations.
Our work illustrates how the dataset can be used as a benchmark for conversational medical ASR as well as consultation note generation from transcripts.
arXiv Detail & Related papers (2022-04-01T10:18:28Z) - Investigation of Data Augmentation Techniques for Disordered Speech
Recognition [69.50670302435174]
This paper investigates a set of data augmentation techniques for disordered speech recognition.
Both normal and disordered speech were exploited in the augmentation process.
The final speaker adapted system constructed using the UASpeech corpus and the best augmentation approach based on speed perturbation produced up to 2.92% absolute word error rate (WER)
arXiv Detail & Related papers (2022-01-14T17:09:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.