Kaggle Competition: Cantonese Audio-Visual Speech Recognition for In-car
Commands
- URL: http://arxiv.org/abs/2207.02663v1
- Date: Wed, 6 Jul 2022 13:31:56 GMT
- Title: Kaggle Competition: Cantonese Audio-Visual Speech Recognition for In-car
Commands
- Authors: Wenliang Dai, Samuel Cahyawijaya, Tiezheng Yu, Elham J Barezi, Pascale
Fung
- Abstract summary: In-car smart assistants should be able to process general as well as car-related commands.
Most datasets are in major languages, such as English and Chinese.
We propose Cantonese Audio-Visual Speech Recognition for In-car Commands.
- Score: 48.155806720847394
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the rise of deep learning and intelligent vehicles, the smart assistant
has become an essential in-car component to facilitate driving and provide
extra functionalities. In-car smart assistants should be able to process
general as well as car-related commands and perform corresponding actions,
which eases driving and improves safety. However, in this research field, most
datasets are in major languages, such as English and Chinese. There is a huge
data scarcity issue for low-resource languages, hindering the development of
research and applications for broader communities. Therefore, it is crucial to
have more benchmarks to raise awareness and motivate the research in
low-resource languages. To mitigate this problem, we collect a new dataset,
namely Cantonese In-car Audio-Visual Speech Recognition (CI-AVSR), for in-car
speech recognition in the Cantonese language with video and audio data.
Together with it, we propose Cantonese Audio-Visual Speech Recognition for
In-car Commands as a new challenge for the community to tackle low-resource
speech recognition under in-car scenarios.
Related papers
- Automatic Speech Recognition for Hindi [0.6292138336765964]
The research involved developing a web application and designing a web interface for speech recognition.
The web application manages large volumes of audio files and their transcriptions, facilitating human correction of ASR transcripts.
The web interface for speech recognition records 16 kHz mono audio from any device running the web app, performs voice activity detection (VAD), and sends the audio to the recognition engine.
arXiv Detail & Related papers (2024-06-26T07:39:20Z) - Discovering Phonetic Inventories with Crosslingual Automatic Speech
Recognition [71.49308685090324]
This paper investigates the influence of different factors (i.e., model architecture, phonotactic model, type of speech representation) on phone recognition in an unknown language.
We find that unique sounds, similar sounds, and tone languages remain a major challenge for phonetic inventory discovery.
arXiv Detail & Related papers (2022-01-26T22:12:55Z) - CI-AVSR: A Cantonese Audio-Visual Speech Dataset for In-car Command
Recognition [91.33781557979819]
We introduce a new dataset, Cantonese In-car Audio-Visual Speech Recognition (CI-AVSR)
It consists of 4,984 samples (8.3 hours) of 200 in-car commands recorded by 30 native Cantonese speakers.
We provide detailed statistics of both the clean and the augmented versions of our dataset.
arXiv Detail & Related papers (2022-01-11T06:32:12Z) - Automatic Speech Recognition Datasets in Cantonese Language: A Survey
and a New Dataset [85.52036362232688]
Our dataset consists of 73.6 hours of clean read speech paired with transcripts, collected from Cantonese audiobooks from Hong Kong.
It combines philosophy, politics, education, culture, lifestyle and family domains, covering a wide range of topics.
We create a powerful and robust Cantonese ASR model by applying multi-dataset learning on MDCC and Common Voice zh-HK.
arXiv Detail & Related papers (2022-01-07T12:09:15Z) - Voice Conversion Can Improve ASR in Very Low-Resource Settings [32.170748231414365]
We study whether a VC system can be used cross-lingually to improve low-resource speech recognition.
We combine several recent techniques to design and train a practical VC system in English.
We find that when using a sensible amount of augmented data, speech recognition performance is improved in all four low-resource languages considered.
arXiv Detail & Related papers (2021-11-04T07:57:00Z) - Cross-lingual Transfer for Speech Processing using Acoustic Language
Similarity [81.51206991542242]
Cross-lingual transfer offers a compelling way to help bridge this digital divide.
Current cross-lingual algorithms have shown success in text-based tasks and speech-related tasks over some low-resource languages.
We propose a language similarity approach that can efficiently identify acoustic cross-lingual transfer pairs across hundreds of languages.
arXiv Detail & Related papers (2021-11-02T01:55:17Z) - Meta-Transfer Learning for Code-Switched Speech Recognition [72.84247387728999]
We propose a new learning method, meta-transfer learning, to transfer learn on a code-switched speech recognition system in a low-resource setting.
Our model learns to recognize individual languages, and transfer them so as to better recognize mixed-language speech by conditioning the optimization on the code-switching data.
arXiv Detail & Related papers (2020-04-29T14:27:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.