Design of a novel Korean learning application for efficient
pronunciation correction
- URL: http://arxiv.org/abs/2205.02001v1
- Date: Wed, 4 May 2022 11:19:29 GMT
- Title: Design of a novel Korean learning application for efficient
pronunciation correction
- Authors: Minjong Cheon, Minseon Kim, Hanseon Joo
- Abstract summary: Speech recognition, speech-to-text, and speech-to-waveform are the three key systems in the proposed system.
The software will then display the user's phrase and answer, with mispronounced elements highlighted in red.
- Score: 2.008880264104061
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The Korean wave, which denotes the global popularity of South Korea's
cultural economy, contributes to the increasing demand for the Korean language.
However, as there does not exist any application for foreigners to learn
Korean, this paper suggested a design of a novel Korean learning application.
Speech recognition, speech-to-text, and speech-to-waveform are the three key
systems in the proposed system. The Google API and the librosa library will
transform the user's voice into a sentence and MFCC. The software will then
display the user's phrase and answer, with mispronounced elements highlighted
in red, allowing users to more easily recognize the incorrect parts of their
pronunciation. Furthermore, the Siamese network might utilize those translated
spectrograms to provide a similarity score, which could subsequently be used to
offer feedback to the user. Despite the fact that we were unable to collect
sufficient foreigner data for this research, it is notable that we presented a
novel Korean pronunciation correction method for foreigners.
Related papers
- RedWhale: An Adapted Korean LLM Through Efficient Continual Pretraining [0.0]
We present RedWhale, a model specifically tailored for Korean language processing.
RedWhale is developed using an efficient continual pretraining approach that includes a comprehensive Korean corpus preprocessing pipeline.
Experimental results demonstrate that RedWhale outperforms other leading models on Korean NLP benchmarks.
arXiv Detail & Related papers (2024-08-21T02:49:41Z) - Does Incomplete Syntax Influence Korean Language Model? Focusing on Word Order and Case Markers [7.275938266030414]
Syntactic elements, such as word order and case markers, are fundamental in natural language processing.
This study explores whether Korean language models can accurately capture this flexibility.
arXiv Detail & Related papers (2024-07-12T11:33:41Z) - Lip Reading for Low-resource Languages by Learning and Combining General
Speech Knowledge and Language-specific Knowledge [57.38948190611797]
This paper proposes a novel lip reading framework, especially for low-resource languages.
Since low-resource languages do not have enough video-text paired data to train the model, it is regarded as challenging to develop lip reading models for low-resource languages.
arXiv Detail & Related papers (2023-08-18T05:19:03Z) - Language-agnostic Code-Switching in Sequence-To-Sequence Speech
Recognition [62.997667081978825]
Code-Switching (CS) is referred to the phenomenon of alternately using words and phrases from different languages.
We propose a simple yet effective data augmentation in which audio and corresponding labels of different source languages are transcribed.
We show that this augmentation can even improve the model's performance on inter-sentential language switches not seen during training by 5,03% WER.
arXiv Detail & Related papers (2022-10-17T12:15:57Z) - Korean Tokenization for Beam Search Rescoring in Speech Recognition [13.718396242036818]
We propose a Korean tokenization method for neural network-based LM used for Korean ASR.
A new tokenization method that inserts a special token, SkipTC, when there is no trailing consonant in a Korean syllable is proposed.
Experiments show that the proposed approach achieves a lower word error rate compared to the same LM model without SkipTC.
arXiv Detail & Related papers (2022-02-22T11:25:01Z) - Learning How to Translate North Korean through South Korean [24.38451366384134]
South and North Korea both use the Korean language.
Existing NLP systems of the Korean language cannot handle North Korean inputs.
We create data for North Korean NMT models using a comparable corpus.
We verify that a model trained by North Korean bilingual data without human annotation can significantly boost North Korean translation accuracy.
arXiv Detail & Related papers (2022-01-27T01:21:29Z) - K-Wav2vec 2.0: Automatic Speech Recognition based on Joint Decoding of
Graphemes and Syllables [2.0813318162800707]
K-Wav2Vec 2.0 is a modified version of Wav2vec 2.0 designed for Korean automatic speech recognition.
In fine-tuning, we propose a multi-task hierarchical architecture to reflect the Korean writing structure.
In pre-training, we attempted the cross-lingual transfer of the pre-trained model by further pre-training the English Wav2vec 2.0 on a Korean dataset.
arXiv Detail & Related papers (2021-10-11T11:53:12Z) - Non-autoregressive Mandarin-English Code-switching Speech Recognition
with Pinyin Mask-CTC and Word Embedding Regularization [61.749126838659315]
Mandarin-English code-switching (CS) is frequently used among East and Southeast Asian people.
Recent successful non-autoregressive (NAR) ASR models remove the need for left-to-right beam decoding in autoregressive (AR) models.
We propose changing the Mandarin output target of the encoder to Pinyin for faster encoder training, and introduce Pinyin-to-Mandarin decoder to learn contextualized information.
arXiv Detail & Related papers (2021-04-06T03:01:09Z) - That Sounds Familiar: an Analysis of Phonetic Representations Transfer
Across Languages [72.9927937955371]
We use the resources existing in other languages to train a multilingual automatic speech recognition model.
We observe significant improvements across all languages in the multilingual setting, and stark degradation in the crosslingual setting.
Our analysis uncovered that even the phones that are unique to a single language can benefit greatly from adding training data from other languages.
arXiv Detail & Related papers (2020-05-16T22:28:09Z) - Synchronous Bidirectional Learning for Multilingual Lip Reading [99.14744013265594]
Lip movements of all languages share similar patterns due to the common structures of human organs.
Phonemes are more closely related with the lip movements than the alphabet letters.
A novel SBL block is proposed to learn the rules for each language in a fill-in-the-blank way.
arXiv Detail & Related papers (2020-05-08T04:19:57Z) - Rnn-transducer with language bias for end-to-end Mandarin-English
code-switching speech recognition [58.105818353866354]
We propose an improved recurrent neural network transducer (RNN-T) model with language bias to alleviate the problem.
We use the language identities to bias the model to predict the CS points.
This promotes the model to learn the language identity information directly from transcription, and no additional LID model is needed.
arXiv Detail & Related papers (2020-02-19T12:01:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.