Zambezi Voice: A Multilingual Speech Corpus for Zambian Languages
- URL: http://arxiv.org/abs/2306.04428v2
- Date: Tue, 13 Jun 2023 20:48:02 GMT
- Title: Zambezi Voice: A Multilingual Speech Corpus for Zambian Languages
- Authors: Claytone Sikasote, Kalinda Siaminwe, Stanly Mwape, Bangiwe Zulu, Mofya
Phiri, Martin Phiri, David Zulu, Mayumbo Nyirenda, Antonios Anastasopoulos
- Abstract summary: Zambezi Voice is an open-source multilingual speech resource for Zambian languages.
To our knowledge, this is the first multilingual speech dataset created for Zambian languages.
- Score: 20.25236081418051
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This work introduces Zambezi Voice, an open-source multilingual speech
resource for Zambian languages. It contains two collections of datasets:
unlabelled audio recordings of radio news and talk shows programs (160 hours)
and labelled data (over 80 hours) consisting of read speech recorded from text
sourced from publicly available literature books. The dataset is created for
speech recognition but can be extended to multilingual speech processing
research for both supervised and unsupervised learning approaches. To our
knowledge, this is the first multilingual speech dataset created for Zambian
languages. We exploit pretraining and cross-lingual transfer learning by
finetuning the Wav2Vec2.0 large-scale multilingual pre-trained model to build
end-to-end (E2E) speech recognition models for our baseline models. The dataset
is released publicly under a Creative Commons BY-NC-ND 4.0 license and can be
accessed via https://github.com/unza-speech-lab/zambezi-voice .
Related papers
- Enhancing Low-Resource Language and Instruction Following Capabilities of Audio Language Models [13.855545744177586]
This paper examines the performance of existing audio language models in an underserved language using Thai.
Despite being built on multilingual backbones, audio language models do not exhibit cross-lingual emergent abilities.
This paper integrates audio comprehension and speech instruction-following capabilities into a single unified model.
arXiv Detail & Related papers (2024-09-17T09:04:03Z) - Lip Reading for Low-resource Languages by Learning and Combining General
Speech Knowledge and Language-specific Knowledge [57.38948190611797]
This paper proposes a novel lip reading framework, especially for low-resource languages.
Since low-resource languages do not have enough video-text paired data to train the model, it is regarded as challenging to develop lip reading models for low-resource languages.
arXiv Detail & Related papers (2023-08-18T05:19:03Z) - AudioPaLM: A Large Language Model That Can Speak and Listen [79.44757696533709]
We introduce AudioPaLM, a large language model for speech understanding and generation.
AudioPaLM fuses text-based and speech-based language models.
It can process and generate text and speech with applications including speech recognition and speech-to-speech translation.
arXiv Detail & Related papers (2023-06-22T14:37:54Z) - PolyVoice: Language Models for Speech to Speech Translation [50.31000706309143]
PolyVoice is a language model-based framework for speech-to-speech translation (S2ST)
We use discretized speech units, which are generated in a fully unsupervised way.
For the speech synthesis part, we adopt the existing VALL-E X approach and build a unit-based audio language model.
arXiv Detail & Related papers (2023-06-05T15:53:15Z) - Scaling Speech Technology to 1,000+ Languages [66.31120979098483]
The Massively Multilingual Speech (MMS) project increases the number of supported languages by 10-40x, depending on the task.
Main ingredients are a new dataset based on readings of publicly available religious texts.
We built pre-trained wav2vec 2.0 models covering 1,406 languages, a single multilingual automatic speech recognition model for 1,107 languages, speech synthesis models for the same number of languages, and a language identification model for 4,017 languages.
arXiv Detail & Related papers (2023-05-22T22:09:41Z) - ASR2K: Speech Recognition for Around 2000 Languages without Audio [100.41158814934802]
We present a speech recognition pipeline that does not require any audio for the target language.
Our pipeline consists of three components: acoustic, pronunciation, and language models.
We build speech recognition for 1909 languages by combining it with Crubadan: a large endangered languages n-gram database.
arXiv Detail & Related papers (2022-09-06T22:48:29Z) - Large vocabulary speech recognition for languages of Africa:
multilingual modeling and self-supervised learning [11.408563104045285]
Almost none of the 2,000+ languages spoken in Africa have widely available automatic speech recognition systems.
We have experimented with two techniques which may provide pathways to large vocabulary speech recognition for African languages.
arXiv Detail & Related papers (2022-08-05T09:54:19Z) - Textless Speech-to-Speech Translation on Real Data [49.134208897722246]
We present a textless speech-to-speech translation (S2ST) system that can translate speech from one language into another language.
We tackle the challenge in modeling multi-speaker target speech and train the systems with real-world S2ST data.
arXiv Detail & Related papers (2021-12-15T18:56:35Z) - Cross-lingual Multispeaker Text-to-Speech under Limited-Data Scenario [10.779568857641928]
This paper presents an extension on Tacotron2 to achieve bilingual multispeaker speech synthesis.
We achieve cross-lingual synthesis, including code-switching cases, between English and Mandarin for monolingual speakers.
arXiv Detail & Related papers (2020-05-21T03:03:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.