MyVoice: Arabic Speech Resource Collaboration Platform
- URL: http://arxiv.org/abs/2308.02503v1
- Date: Sun, 23 Jul 2023 07:13:30 GMT
- Title: MyVoice: Arabic Speech Resource Collaboration Platform
- Authors: Yousseif Elshahawy, Yassine El Kheir, Shammur Absar Chowdhury, and
Ahmed Ali
- Abstract summary: MyVoice is a crowdsourcing platform designed to collect Arabic speech.
MyVoice allows contributors to select city/country-level fine-grained dialect.
Users can switch roles between contributors and annotators.
- Score: 8.098700090427721
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We introduce MyVoice, a crowdsourcing platform designed to collect Arabic
speech to enhance dialectal speech technologies. This platform offers an
opportunity to design large dialectal speech datasets; and makes them publicly
available. MyVoice allows contributors to select city/country-level
fine-grained dialect and record the displayed utterances. Users can switch
roles between contributors and annotators. The platform incorporates a quality
assurance system that filters out low-quality and spurious recordings before
sending them for validation. During the validation phase, contributors can
assess the quality of recordings, annotate them, and provide feedback which is
then reviewed by administrators. Furthermore, the platform offers flexibility
to admin roles to add new data or tasks beyond dialectal speech and word
collection, which are displayed to contributors. Thus, enabling collaborative
efforts in gathering diverse and large Arabic speech data.
Related papers
- The ParlaSpeech Collection of Automatically Generated Speech and Text Datasets from Parliamentary Proceedings [0.0]
We present our approach to building large and open speech-and-text-aligned datasets of less-resourced languages.
We focus on three Slavic languages, namely Croatian, Polish, and Serbian.
The results of this pilot run are three high-quality datasets that span more than 5,000 hours of speech and accompanying text transcripts.
arXiv Detail & Related papers (2024-09-23T10:12:18Z) - Identifying Speakers in Dialogue Transcripts: A Text-based Approach Using Pretrained Language Models [83.7506131809624]
We introduce an approach to identifying speaker names in dialogue transcripts, a crucial task for enhancing content accessibility and searchability in digital media archives.
We present a novel, large-scale dataset derived from the MediaSum corpus, encompassing transcripts from a wide range of media sources.
We propose novel transformer-based models tailored for SpeakerID, leveraging contextual cues within dialogues to accurately attribute speaker names.
arXiv Detail & Related papers (2024-07-16T18:03:58Z) - Textless Unit-to-Unit training for Many-to-Many Multilingual Speech-to-Speech Translation [65.13824257448564]
This paper proposes a textless training method for many-to-many multilingual speech-to-speech translation.
By treating the speech units as pseudo-text, we can focus on the linguistic content of the speech.
We demonstrate that the proposed UTUT model can be effectively utilized not only for Speech-to-Speech Translation (S2ST) but also for multilingual Text-to-Speech Synthesis (T2S) and Text-to-Speech Translation (T2ST)
arXiv Detail & Related papers (2023-08-03T15:47:04Z) - AudioPaLM: A Large Language Model That Can Speak and Listen [79.44757696533709]
We introduce AudioPaLM, a large language model for speech understanding and generation.
AudioPaLM fuses text-based and speech-based language models.
It can process and generate text and speech with applications including speech recognition and speech-to-speech translation.
arXiv Detail & Related papers (2023-06-22T14:37:54Z) - PolyVoice: Language Models for Speech to Speech Translation [50.31000706309143]
PolyVoice is a language model-based framework for speech-to-speech translation (S2ST)
We use discretized speech units, which are generated in a fully unsupervised way.
For the speech synthesis part, we adopt the existing VALL-E X approach and build a unit-based audio language model.
arXiv Detail & Related papers (2023-06-05T15:53:15Z) - QVoice: Arabic Speech Pronunciation Learning Application [11.913011065023758]
The application is designed to support non-native Arabic speakers in enhancing their pronunciation skills.
QVoice employs various learning cues to aid learners in comprehending meaning.
The learning cues featured in QVoice encompass a wide range of meaningful information.
arXiv Detail & Related papers (2023-05-09T07:21:46Z) - Building African Voices [125.92214914982753]
This paper focuses on speech synthesis for low-resourced African languages.
We create a set of general-purpose instructions on building speech synthesis systems with minimum technological resources.
We release the speech data, code, and trained voices for 12 African languages to support researchers and developers.
arXiv Detail & Related papers (2022-07-01T23:28:16Z) - Self-Supervised Representations Improve End-to-End Speech Translation [57.641761472372814]
We show that self-supervised pre-trained features can consistently improve the translation performance.
Cross-lingual transfer allows to extend to a variety of languages without or with little tuning.
arXiv Detail & Related papers (2020-06-22T10:28:38Z) - Cross-lingual Multispeaker Text-to-Speech under Limited-Data Scenario [10.779568857641928]
This paper presents an extension on Tacotron2 to achieve bilingual multispeaker speech synthesis.
We achieve cross-lingual synthesis, including code-switching cases, between English and Mandarin for monolingual speakers.
arXiv Detail & Related papers (2020-05-21T03:03:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.