NeuroVoz: a Castillian Spanish corpus of parkinsonian speech
- URL: http://arxiv.org/abs/2403.02371v2
- Date: Wed, 6 Mar 2024 11:08:02 GMT
- Title: NeuroVoz: a Castillian Spanish corpus of parkinsonian speech
- Authors: Jana\'ina Mendes-Laureano, Jorge A. G\'omez-Garc\'ia, Alejandro
Guerrero-L\'opez, Elisa Luque-Buzo, Juli\'an D. Arias-Londo\~no, Francisco J.
Grandas-P\'erez, Juan I. Godino-Llorente
- Abstract summary: NeuroVoz is composed by 2,903 audio recordings averaging $26.88 pm 3.35$ recordings per participant.
This dataset has already underpinned several studies, achieving a benchmark accuracy of 89% in PD speech pattern identification.
- Score: 36.23298373892936
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The advancement of Parkinson's Disease (PD) diagnosis through speech analysis
is hindered by a notable lack of publicly available, diverse language datasets,
limiting the reproducibility and further exploration of existing research.
In response to this gap, we introduce a comprehensive corpus from 108 native
Castilian Spanish speakers, comprising 55 healthy controls and 53 individuals
diagnosed with PD, all of whom were under pharmacological treatment and
recorded in their medication-optimized state. This unique dataset features a
wide array of speech tasks, including sustained phonation of the five Spanish
vowels, diadochokinetic tests, 16 listen-and-repeat utterances, and free
monologues. The dataset emphasizes accuracy and reliability through specialist
manual transcriptions of the listen-and-repeat tasks and utilizes Whisper for
automated monologue transcriptions, making it the most complete public corpus
of Parkinsonian speech, and the first in Castillian Spanish.
NeuroVoz is composed by 2,903 audio recordings averaging $26.88 \pm 3.35$
recordings per participant, offering a substantial resource for the scientific
exploration of PD's impact on speech. This dataset has already underpinned
several studies, achieving a benchmark accuracy of 89% in PD speech pattern
identification, indicating marked speech alterations attributable to PD.
Despite these advances, the broader challenge of conducting a
language-agnostic, cross-corpora analysis of Parkinsonian speech patterns
remains an open area for future research. This contribution not only fills a
critical void in PD speech analysis resources but also sets a new standard for
the global research community in leveraging speech as a diagnostic tool for
neurodegenerative diseases.
Related papers
- Language-Agnostic Analysis of Speech Depression Detection [2.5764071253486636]
This work analyzes automatic speech-based depression detection across two languages, English and Malayalam.
A CNN model is trained to identify acoustic features associated with depression in speech, focusing on both languages.
Our findings and collected data could contribute to the development of language-agnostic speech-based depression detection systems.
arXiv Detail & Related papers (2024-09-23T07:35:56Z) - Self-supervised Speech Models for Word-Level Stuttered Speech Detection [66.46810024006712]
We introduce a word-level stuttering speech detection model leveraging self-supervised speech models.
Our evaluation demonstrates that our model surpasses previous approaches in word-level stuttering speech detection.
arXiv Detail & Related papers (2024-09-16T20:18:20Z) - Exploring Speech Pattern Disorders in Autism using Machine Learning [12.469348589699766]
This study presents a comprehensive approach to identify distinctive speech patterns through the analysis of examiner-patient dialogues.
We extracted 40 speech-related features, categorized into frequency, zero-crossing rate, energy, spectral characteristics, Mel Frequency Cepstral Coefficients (MFCCs) and balance.
The classification model aimed to differentiate between ASD and non-ASD cases, achieving an accuracy of 87.75%.
arXiv Detail & Related papers (2024-05-03T02:59:15Z) - Decoding speech perception from non-invasive brain recordings [48.46819575538446]
We introduce a model trained with contrastive-learning to decode self-supervised representations of perceived speech from non-invasive recordings.
Our model can identify, from 3 seconds of MEG signals, the corresponding speech segment with up to 41% accuracy out of more than 1,000 distinct possibilities.
arXiv Detail & Related papers (2022-08-25T10:01:43Z) - Exploiting Cross-domain And Cross-Lingual Ultrasound Tongue Imaging
Features For Elderly And Dysarthric Speech Recognition [55.25565305101314]
Articulatory features are invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition systems.
This paper presents a cross-domain and cross-lingual A2A inversion approach that utilizes the parallel audio and ultrasound tongue imaging (UTI) data of the 24-hour TaL corpus in A2A model pre-training.
Experiments conducted on three tasks suggested incorporating the generated articulatory features consistently outperformed the baseline TDNN and Conformer ASR systems.
arXiv Detail & Related papers (2022-06-15T07:20:28Z) - Parkinson's disease diagnostics using AI and natural language knowledge
transfer [0.0]
Deep learning approach for classification of raw speech recordings in patients with diagnosed PD was proposed.
Method was tested on a group of 38 PD patients and 10 healthy persons above the age of 50.
arXiv Detail & Related papers (2022-04-26T19:39:29Z) - Investigation of Data Augmentation Techniques for Disordered Speech
Recognition [69.50670302435174]
This paper investigates a set of data augmentation techniques for disordered speech recognition.
Both normal and disordered speech were exploited in the augmentation process.
The final speaker adapted system constructed using the UASpeech corpus and the best augmentation approach based on speed perturbation produced up to 2.92% absolute word error rate (WER)
arXiv Detail & Related papers (2022-01-14T17:09:22Z) - The Phonetic Footprint of Parkinson's Disease [16.64383793837174]
Parkinson's disease (PD) has a significant impact on the fine motor skills of patients.
Characteristic patterns such as vowel instability, slurred pronunciation and slow speech can often be observed in the affected individuals.
We used a phonetic recognizer trained exclusively on healthy speech data to investigate how PD affected the phonetic footprint of patients.
arXiv Detail & Related papers (2021-12-21T20:44:21Z) - Silent Speech Interfaces for Speech Restoration: A Review [59.68902463890532]
Silent speech interface (SSI) research aims to provide alternative and augmentative communication methods for persons with severe speech disorders.
SSIs rely on non-acoustic biosignals generated by the human body during speech production to enable communication.
Most present-day SSIs have only been validated in laboratory settings for healthy users.
arXiv Detail & Related papers (2020-09-04T11:05:50Z) - Detecting Parkinson's Disease From an Online Speech-task [4.968576908394359]
In this paper, we envision a web-based framework that can help anyone, anywhere around the world record a short speech task, and analyze the recorded data to screen for Parkinson's disease (PD)
We collected data from 726 unique participants (262 PD, 38% female; 464 non-PD, 65% female; average age: 61) from all over the US and beyond.
We extracted both standard acoustic features (MFCC), jitter and shimmer variants, and deep learning based features from the speech data.
Our model performed equally well on data collected in controlled lab environment as well as 'in the wild'
arXiv Detail & Related papers (2020-09-02T21:16:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.