VOTE400(Voide Of The Elderly 400 Hours): A Speech Dataset to Study Voice
Interface for Elderly-Care
- URL: http://arxiv.org/abs/2101.11469v1
- Date: Wed, 20 Jan 2021 05:28:05 GMT
- Title: VOTE400(Voide Of The Elderly 400 Hours): A Speech Dataset to Study Voice
Interface for Elderly-Care
- Authors: Minsu Jang, Sangwon Seo, Dohyung Kim, Jaeyeon Lee, Jaehong Kim,
Jun-Hwan Ahn
- Abstract summary: The dataset includes about 300 hours of continuous dialog speech and 100 hours of read speech, both recorded by the elderly people aged 65 years or over.
A preliminary experiment showed that speech recognition system trained with VOTE400 can outperform conventional systems in speech recognition of elderly people's voice.
- Score: 11.87467300760354
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper introduces a large-scale Korean speech dataset, called VOTE400,
that can be used for analyzing and recognizing voices of the elderly people.
The dataset includes about 300 hours of continuous dialog speech and 100 hours
of read speech, both recorded by the elderly people aged 65 years or over. A
preliminary experiment showed that speech recognition system trained with
VOTE400 can outperform conventional systems in speech recognition of elderly
people's voice. This work is a multi-organizational effort led by ETRI and
MINDs Lab Inc. for the purpose of advancing the speech recognition performance
of the elderly-care robots.
Related papers
- A Review of Challenges in Speech-based Conversational AI for Elderly Care [3.257656198821199]
Speech-controlled assistants may support the elderly and enable remote health monitoring.
The bottleneck for efficacy is how well these devices work in practice and how the elderly experience them.
We review elderly use of voice-controlled AI and highlight various user- and technology-centered issues.
arXiv Detail & Related papers (2024-12-10T10:32:22Z) - Towards Unsupervised Speech Recognition Without Pronunciation Models [57.222729245842054]
In this article, we tackle the challenge of developing ASR systems without paired speech and text corpora.
We experimentally demonstrate that an unsupervised speech recognizer can emerge from joint speech-to-speech and text-to-text masked token-infilling.
This innovative model surpasses the performance of previous unsupervised ASR models under the lexicon-free setting.
arXiv Detail & Related papers (2024-06-12T16:30:58Z) - AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection [46.855958156126164]
AS-70 is the first publicly available Mandarin stuttered speech dataset.
This paper introduces AS-70, the first publicly available Mandarin stuttered speech dataset.
arXiv Detail & Related papers (2024-06-11T13:35:50Z) - EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation [83.29199726650899]
The EARS dataset comprises 107 speakers from diverse backgrounds, totaling in 100 hours of clean, anechoic speech data.
The dataset covers a large range of different speaking styles, including emotional speech, different reading styles, non-verbal sounds, and conversational freeform speech.
We benchmark various methods for speech enhancement and dereverberation on the dataset and evaluate their performance through a set of instrumental metrics.
arXiv Detail & Related papers (2024-06-10T11:28:29Z) - Latent Phrase Matching for Dysarthric Speech [23.23672790496787]
Many consumer speech recognition systems are not tuned for people with speech disabilities.
We propose a query-by-example-based personalized phrase recognition system that is trained using small amounts of speech.
Performance degrades as the number of phrases increases, but consistently outperforms ASR systems when trained with 50 unique phrases.
arXiv Detail & Related papers (2023-06-08T17:28:28Z) - Speaker Identification using Speech Recognition [0.0]
This research provides a mechanism for identifying a speaker in an audio file, based on the human voice biometric features like pitch, amplitude, frequency etc.
We proposed an unsupervised learning model where the model can learn speech representation with limited dataset.
arXiv Detail & Related papers (2022-05-29T13:03:42Z) - Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech
Recognition [60.84668086976436]
An unsupervised text-to-speech synthesis (TTS) system learns to generate the speech waveform corresponding to any written sentence in a language.
This paper proposes an unsupervised TTS system by leveraging recent advances in unsupervised automatic speech recognition (ASR)
Our unsupervised system can achieve comparable performance to the supervised system in seven languages with about 10-20 hours of speech each.
arXiv Detail & Related papers (2022-03-29T17:57:53Z) - ASR data augmentation in low-resource settings using cross-lingual
multi-speaker TTS and cross-lingual voice conversion [49.617722668505834]
We show that our approach permits the application of speech synthesis and voice conversion to improve ASR systems using only one target-language speaker during model training.
It is possible to obtain promising ASR training results with our data augmentation method using only a single real speaker in a target language.
arXiv Detail & Related papers (2022-03-29T11:55:30Z) - Mandarin-English Code-switching Speech Recognition with Self-supervised
Speech Representation Models [55.82292352607321]
Code-switching (CS) is common in daily conversations where more than one language is used within a sentence.
This paper uses the recently successful self-supervised learning (SSL) methods to leverage many unlabeled speech data without CS.
arXiv Detail & Related papers (2021-10-07T14:43:35Z) - JukeBox: A Multilingual Singer Recognition Dataset [17.33151600403503]
textitJukeBox is a speaker recognition dataset with multilingual singing voice audio annotated with singer identity, gender, and language labels.
We use the current state-of-the-art methods to demonstrate the difficulty of performing speaker recognition on singing voice using models trained on spoken voice alone.
arXiv Detail & Related papers (2020-08-08T12:22:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.