ClovaCall: Korean Goal-Oriented Dialog Speech Corpus for Automatic
Speech Recognition of Contact Centers
- URL: http://arxiv.org/abs/2004.09367v2
- Date: Sun, 17 May 2020 06:53:34 GMT
- Title: ClovaCall: Korean Goal-Oriented Dialog Speech Corpus for Automatic
Speech Recognition of Contact Centers
- Authors: Jung-Woo Ha, Kihyun Nam, Jingu Kang, Sang-Woo Lee, Sohee Yang,
Hyunhoon Jung, Eunmi Kim, Hyeji Kim, Soojin Kim, Hyun Ah Kim, Kyoungtae Doh,
Chan Kyu Lee, Nako Sung, Sunghun Kim
- Abstract summary: We introduce a new large-scale Korean call-based speech corpus under a goal-oriented dialog scenario from more than 11,000 people.
ClovaCall includes approximately 60,000 pairs of a short sentence and its corresponding spoken utterance in a restaurant reservation domain.
We validate the effectiveness of our dataset with intensive experiments using two standard ASR models.
- Score: 23.076908473357577
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatic speech recognition (ASR) via call is essential for various
applications, including AI for contact center (AICC) services. Despite the
advancement of ASR, however, most publicly available call-based speech corpora
such as Switchboard are old-fashioned. Also, most existing call corpora are in
English and mainly focus on open domain dialog or general scenarios such as
audiobooks. Here we introduce a new large-scale Korean call-based speech corpus
under a goal-oriented dialog scenario from more than 11,000 people, i.e.,
ClovaCall corpus. ClovaCall includes approximately 60,000 pairs of a short
sentence and its corresponding spoken utterance in a restaurant reservation
domain. We validate the effectiveness of our dataset with intensive experiments
using two standard ASR models. Furthermore, we release our ClovaCall dataset
and baseline source codes to be available via
https://github.com/ClovaAI/ClovaCall.
Related papers
- Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages [76.14451035425229]
We introduce Omnilingual ASR, a large-scale automatic speech recognition system.<n>It scales self-supervised pre-training to 7B parameters to learn robust speech representations.<n>It expands coverage to over 1,600 languages, including over 500 never before served by ASR.
arXiv Detail & Related papers (2025-11-12T19:48:09Z) - POWSM: A Phonetic Open Whisper-Style Speech Foundation Model [50.73202227472358]
POWSM is the first unified framework capable of jointly performing multiple phone-related tasks.<n>Our training data, code and models are released to foster open science.
arXiv Detail & Related papers (2025-10-28T21:43:45Z) - Hello Afrika: Speech Commands in Kinyarwanda [0.0]
There is a dearth of speech command models for African languages.<n>Hello Afrika aims to address this issue and its first iteration is focused on the Kinyarwanda language.<n>The model was built off a custom speech command corpus made up of general directives, numbers, and a wake word.
arXiv Detail & Related papers (2025-06-16T16:30:19Z) - Connecting Voices: LoReSpeech as a Low-Resource Speech Parallel Corpus [0.0]
This paper introduces a methodology for constructing LoReSpeech, a low-resource speech-to-speech translation corpus.
LoReSpeech delivers both intra- and inter-language alignments, enabling advancements in multilingual ASR systems.
arXiv Detail & Related papers (2025-02-25T14:00:15Z) - FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs [63.8261207950923]
FunAudioLLM is a model family designed to enhance natural voice interactions between humans and large language models (LLMs)
At its core are two innovative models: SenseVoice, which handles multilingual speech recognition, emotion recognition, and audio event detection; and CosyVoice, which facilitates natural speech generation with control over multiple languages, timbre, speaking style, and speaker identity.
The models related to SenseVoice and CosyVoice have been open-sourced on Modelscope and Huggingface, along with the corresponding training, inference, and fine-tuning codes released on GitHub.
arXiv Detail & Related papers (2024-07-04T16:49:02Z) - Towards Zero-Shot Text-To-Speech for Arabic Dialects [16.10882912169842]
Zero-shot multi-speaker text-to-speech (ZS-TTS) systems have advanced for English, however, it still lags behind due to insufficient resources.
We address this gap for Arabic by first adapting an existing dataset to suit the needs of speech synthesis.
We employ a set of Arabic dialect identification models to explore the impact of pre-defined dialect labels on improving the ZS-TTS model in a multi-dialect setting.
arXiv Detail & Related papers (2024-06-24T15:58:15Z) - Code-Switched Urdu ASR for Noisy Telephonic Environment using Data
Centric Approach with Hybrid HMM and CNN-TDNN [0.0]
Urdu is the $10th$ most widely spoken language in the world, with 231,295,440 worldwide still remains a resource constrained language in ASR.
This paper describes an implementation framework of a resource efficient Automatic Speech Recognition/ Speech to Text System in a noisy call-center environment.
arXiv Detail & Related papers (2023-07-24T13:04:21Z) - AudioPaLM: A Large Language Model That Can Speak and Listen [79.44757696533709]
We introduce AudioPaLM, a large language model for speech understanding and generation.
AudioPaLM fuses text-based and speech-based language models.
It can process and generate text and speech with applications including speech recognition and speech-to-speech translation.
arXiv Detail & Related papers (2023-06-22T14:37:54Z) - OpenSR: Open-Modality Speech Recognition via Maintaining Multi-Modality
Alignment [57.15449072423539]
We propose a training system Open-modality Speech Recognition (textbfOpenSR)
OpenSR enables modality transfer from one to any in three different settings.
It achieves highly competitive zero-shot performance compared to the existing few-shot and full-shot lip-reading methods.
arXiv Detail & Related papers (2023-06-10T11:04:10Z) - PolyVoice: Language Models for Speech to Speech Translation [50.31000706309143]
PolyVoice is a language model-based framework for speech-to-speech translation (S2ST)
We use discretized speech units, which are generated in a fully unsupervised way.
For the speech synthesis part, we adopt the existing VALL-E X approach and build a unit-based audio language model.
arXiv Detail & Related papers (2023-06-05T15:53:15Z) - SpokenWOZ: A Large-Scale Speech-Text Benchmark for Spoken Task-Oriented
Dialogue Agents [72.42049370297849]
SpokenWOZ is a large-scale speech-text dataset for spoken TOD.
Cross-turn slot and reasoning slot detection are new challenges for SpokenWOZ.
arXiv Detail & Related papers (2023-05-22T13:47:51Z) - QASR: QCRI Aljazeera Speech Resource -- A Large Scale Annotated Arabic
Speech Corpus [11.113497373432411]
We introduce the largest transcribed Arabic speech corpus, QASR, collected from the broadcast domain.
This multi-dialect speech dataset contains 2,000 hours of speech sampled at 16kHz crawled from Aljazeera news channel.
arXiv Detail & Related papers (2021-06-24T13:20:40Z) - KoSpeech: Open-Source Toolkit for End-to-End Korean Speech Recognition [1.7955614278088239]
KoSpeech is an end-to-end Korean automatic speech recognition (ASR) toolkit based on the deep learning library PyTorch.
We propose preprocessing methods for KsponSpeech corpus and a baseline model for benchmarks.
Our baseline model achieved 10.31% character error rate (CER) at KsponSpeech corpus only with the acoustic model.
arXiv Detail & Related papers (2020-09-07T13:25:36Z) - A Large-Scale Chinese Short-Text Conversation Dataset [77.55813366932313]
We present a large-scale cleaned Chinese conversation dataset, LCCC, which contains a base version (6.8million dialogues) and a large version (12.0 million dialogues)
The quality of our dataset is ensured by a rigorous data cleaning pipeline, which is built based on a set of rules.
We also release pre-training dialogue models which are trained on LCCC-base and LCCC-large respectively.
arXiv Detail & Related papers (2020-08-10T08:12:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.