QVoice: Arabic Speech Pronunciation Learning Application
- URL: http://arxiv.org/abs/2305.07445v1
- Date: Tue, 9 May 2023 07:21:46 GMT
- Title: QVoice: Arabic Speech Pronunciation Learning Application
- Authors: Yassine El Kheir, Fouad Khnaisser, Shammur Absar Chowdhury, Hamdy
Mubarak, Shazia Afzal, Ahmed Ali
- Abstract summary: The application is designed to support non-native Arabic speakers in enhancing their pronunciation skills.
QVoice employs various learning cues to aid learners in comprehending meaning.
The learning cues featured in QVoice encompass a wide range of meaningful information.
- Score: 11.913011065023758
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This paper introduces a novel Arabic pronunciation learning application
QVoice, powered with end-to-end mispronunciation detection and feedback
generator module. The application is designed to support non-native Arabic
speakers in enhancing their pronunciation skills, while also helping native
speakers mitigate any potential influence from regional dialects on their
Modern Standard Arabic (MSA) pronunciation. QVoice employs various learning
cues to aid learners in comprehending meaning, drawing connections with their
existing knowledge of English language, and offers detailed feedback for
pronunciation correction, along with contextual examples showcasing word usage.
The learning cues featured in QVoice encompass a wide range of meaningful
information, such as visualizations of phrases/words and their translations, as
well as phonetic transcriptions and transliterations. QVoice provides
pronunciation feedback at the character level and assesses performance at the
word level.
Related papers
- Improving Pronunciation and Accent Conversion through Knowledge Distillation And Synthetic Ground-Truth from Native TTS [52.89324095217975]
Previous approaches on accent conversion mainly aimed at making non-native speech sound more native.
We develop a new AC approach that not only focuses on accent conversion but also improves pronunciation of non-native accented speaker.
arXiv Detail & Related papers (2024-10-19T06:12:31Z) - FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs [63.8261207950923]
FunAudioLLM is a model family designed to enhance natural voice interactions between humans and large language models (LLMs)
At its core are two innovative models: SenseVoice, which handles multilingual speech recognition, emotion recognition, and audio event detection; and CosyVoice, which facilitates natural speech generation with control over multiple languages, timbre, speaking style, and speaker identity.
The models related to SenseVoice and CosyVoice have been open-sourced on Modelscope and Huggingface, along with the corresponding training, inference, and fine-tuning codes released on GitHub.
arXiv Detail & Related papers (2024-07-04T16:49:02Z) - Exploiting Dialect Identification in Automatic Dialectal Text Normalization [9.320305816520422]
We aim to normalize Dialectal Arabic into the Conventional Orthography for Dialectal Arabic (CODA)
We benchmark newly developed sequence-to-sequence models on the task of CODAfication.
We show that using dialect identification information improves the performance across all dialects.
arXiv Detail & Related papers (2024-07-03T11:30:03Z) - A New Benchmark for Evaluating Automatic Speech Recognition in the Arabic Call Domain [0.0]
This work is an attempt to introduce a comprehensive benchmark for Arabic speech recognition, specifically tailored to address the challenges of telephone conversations in Arabic language.
Our work aims to establish a robust benchmark that not only encompasses the broad spectrum of Arabic dialects but also emulates the real-world conditions of call-based communications.
arXiv Detail & Related papers (2024-03-07T07:24:32Z) - Can Language Models Learn to Listen? [96.01685069483025]
We present a framework for generating appropriate facial responses from a listener in dyadic social interactions based on the speaker's words.
Our approach autoregressively predicts a response of a listener: a sequence of listener facial gestures, quantized using a VQ-VAE.
We show that our generated listener motion is fluent and reflective of language semantics through quantitative metrics and a qualitative user study.
arXiv Detail & Related papers (2023-08-21T17:59:02Z) - MyVoice: Arabic Speech Resource Collaboration Platform [8.098700090427721]
MyVoice is a crowdsourcing platform designed to collect Arabic speech.
MyVoice allows contributors to select city/country-level fine-grained dialect.
Users can switch roles between contributors and annotators.
arXiv Detail & Related papers (2023-07-23T07:13:30Z) - AudioPaLM: A Large Language Model That Can Speak and Listen [79.44757696533709]
We introduce AudioPaLM, a large language model for speech understanding and generation.
AudioPaLM fuses text-based and speech-based language models.
It can process and generate text and speech with applications including speech recognition and speech-to-speech translation.
arXiv Detail & Related papers (2023-06-22T14:37:54Z) - Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec
Language Modeling [92.55131711064935]
We propose a cross-lingual neural language model, VALL-E X, for cross-lingual speech synthesis.
VALL-E X inherits strong in-context learning capabilities and can be applied for zero-shot cross-lingual text-to-speech synthesis and zero-shot speech-to-speech translation tasks.
It can generate high-quality speech in the target language via just one speech utterance in the source language as a prompt while preserving the unseen speaker's voice, emotion, and acoustic environment.
arXiv Detail & Related papers (2023-03-07T14:31:55Z) - DDSupport: Language Learning Support System that Displays Differences
and Distances from Model Speech [16.82591185507251]
We propose a new language learning support system that calculates speech scores and detects mispronunciations by beginners.
The proposed system uses deep learning--based speech processing to display the pronunciation score of the learner's speech and the difference/distance between the learner's and a group of models' pronunciation.
arXiv Detail & Related papers (2022-12-08T05:49:15Z) - Differentiable Allophone Graphs for Language-Universal Speech
Recognition [77.2981317283029]
Building language-universal speech recognition systems entails producing phonological units of spoken sound that can be shared across languages.
We present a general framework to derive phone-level supervision from only phonemic transcriptions and phone-to-phoneme mappings.
We build a universal phone-based speech recognition model with interpretable probabilistic phone-to-phoneme mappings for each language.
arXiv Detail & Related papers (2021-07-24T15:09:32Z) - QASR: QCRI Aljazeera Speech Resource -- A Large Scale Annotated Arabic
Speech Corpus [11.113497373432411]
We introduce the largest transcribed Arabic speech corpus, QASR, collected from the broadcast domain.
This multi-dialect speech dataset contains 2,000 hours of speech sampled at 16kHz crawled from Aljazeera news channel.
arXiv Detail & Related papers (2021-06-24T13:20:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.