An ensemble-based framework for mispronunciation detection of Arabic
phonemes
- URL: http://arxiv.org/abs/2301.01378v1
- Date: Tue, 3 Jan 2023 22:17:08 GMT
- Title: An ensemble-based framework for mispronunciation detection of Arabic
phonemes
- Authors: Sukru Selim Calik, Ayhan Kucukmanisa, Zeynep Hilal Kilimci
- Abstract summary: This work introduces an ensemble model that defines the mispronunciation of Arabic phonemes.
Experiment results demonstrate that the utilization of voting as an ensemble algorithm with Mel spectrogram feature extraction technique exhibits remarkable classification result with 95.9% of accuracy.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Determination of mispronunciations and ensuring feedback to users are
maintained by computer-assisted language learning (CALL) systems. In this work,
we introduce an ensemble model that defines the mispronunciation of Arabic
phonemes and assists learning of Arabic, effectively. To the best of our
knowledge, this is the very first attempt to determine the mispronunciations of
Arabic phonemes employing ensemble learning techniques and conventional machine
learning models, comprehensively. In order to observe the effect of feature
extraction techniques, mel-frequency cepstrum coefficients (MFCC), and Mel
spectrogram are blended with each learning algorithm. To show the success of
proposed model, 29 letters in the Arabic phonemes, 8 of which are hafiz, are
voiced by a total of 11 different person. The amount of data set has been
enhanced employing the methods of adding noise, time shifting, time stretching,
pitch shifting. Extensive experiment results demonstrate that the utilization
of voting classifier as an ensemble algorithm with Mel spectrogram feature
extraction technique exhibits remarkable classification result with 95.9% of
accuracy.
Related papers
- Do Audio-Language Models Understand Linguistic Variations? [42.17718387132912]
Open-vocabulary audio language models (ALMs) represent a promising new paradigm for audio-text retrieval using natural language queries.
We propose RobustCLAP, a novel and compute-efficient technique to learn audio-language representations to linguistic variations.
arXiv Detail & Related papers (2024-10-21T20:55:33Z) - CLAIR-A: Leveraging Large Language Models to Judge Audio Captions [73.51087998971418]
evaluating machine-generated audio captions is a complex task that requires considering diverse factors.
We propose CLAIR-A, a simple and flexible method that leverages the zero-shot capabilities of large language models.
In our evaluations, CLAIR-A better predicts human judgements of quality compared to traditional metrics.
arXiv Detail & Related papers (2024-09-19T17:59:52Z) - Strategies for Arabic Readability Modeling [9.976720880041688]
Automatic readability assessment is relevant to building NLP applications for education, content analysis, and accessibility.
We present a set of experimental results on Arabic readability assessment using a diverse range of approaches.
arXiv Detail & Related papers (2024-07-03T11:54:11Z) - Speaker Embedding-aware Neural Diarization for Flexible Number of
Speakers with Textual Information [55.75018546938499]
We propose the speaker embedding-aware neural diarization (SEND) method, which predicts the power set encoded labels.
Our method achieves lower diarization error rate than the target-speaker voice activity detection.
arXiv Detail & Related papers (2021-11-28T12:51:04Z) - Efficient Measuring of Readability to Improve Documents Accessibility
for Arabic Language Learners [0.0]
The approach is based on machine learning classification methods to discriminate between different levels of difficulty in reading and understanding a text.
Several models were trained on a large corpus mined from online Arabic websites and manually annotated.
Best results were achieved using TF-IDF Vectors trained by a combination of word-based unigrams and bigrams with an overall accuracy of 87.14% over four classes of complexity.
arXiv Detail & Related papers (2021-09-09T10:05:38Z) - Leveraging Acoustic and Linguistic Embeddings from Pretrained speech and
language Models for Intent Classification [81.80311855996584]
We propose a novel intent classification framework that employs acoustic features extracted from a pretrained speech recognition system and linguistic features learned from a pretrained language model.
We achieve 90.86% and 99.07% accuracy on ATIS and Fluent speech corpus, respectively.
arXiv Detail & Related papers (2021-02-15T07:20:06Z) - UniSpeech: Unified Speech Representation Learning with Labeled and
Unlabeled Data [54.733889961024445]
We propose a unified pre-training approach called UniSpeech to learn speech representations with both unlabeled and labeled data.
We evaluate the effectiveness of UniSpeech for cross-lingual representation learning on public CommonVoice corpus.
arXiv Detail & Related papers (2021-01-19T12:53:43Z) - Multitask Training with Text Data for End-to-End Speech Recognition [45.35605825009208]
We propose a multitask training method for attention-based end-to-end speech recognition models.
We regularize the decoder in a listen, attend, and spell model by multitask training it on both audio-text and text-only data.
arXiv Detail & Related papers (2020-10-27T14:29:28Z) - Arabic Offensive Language Detection Using Machine Learning and Ensemble
Machine Learning Approaches [0.0]
The study shows significant impact for applying ensemble machine learning approach over the single learner machine learning approach.
Among the trained ensemble machine learning classifiers, bagging performs the best in offensive language detection with F1 score of 88%.
arXiv Detail & Related papers (2020-05-16T06:40:36Z) - Towards Zero-shot Learning for Automatic Phonemic Transcription [82.9910512414173]
A more challenging problem is to build phonemic transcribers for languages with zero training data.
Our model is able to recognize unseen phonemes in the target language without any training data.
It achieves 7.7% better phoneme error rate on average over a standard multilingual model.
arXiv Detail & Related papers (2020-02-26T20:38:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.