Korean Tokenization for Beam Search Rescoring in Speech Recognition
- URL: http://arxiv.org/abs/2203.03583v1
- Date: Tue, 22 Feb 2022 11:25:01 GMT
- Title: Korean Tokenization for Beam Search Rescoring in Speech Recognition
- Authors: Kyuhong Shim, Hyewon Bae, Wonyong Sung
- Abstract summary: We propose a Korean tokenization method for neural network-based LM used for Korean ASR.
A new tokenization method that inserts a special token, SkipTC, when there is no trailing consonant in a Korean syllable is proposed.
Experiments show that the proposed approach achieves a lower word error rate compared to the same LM model without SkipTC.
- Score: 13.718396242036818
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The performance of automatic speech recognition (ASR) models can be greatly
improved by proper beam-search decoding with external language model (LM).
There has been an increasing interest in Korean speech recognition, but not
many studies have been focused on the decoding procedure. In this paper, we
propose a Korean tokenization method for neural network-based LM used for
Korean ASR. Although the common approach is to use the same tokenization method
for external LM as the ASR model, we show that it may not be the best choice
for Korean. We propose a new tokenization method that inserts a special token,
SkipTC, when there is no trailing consonant in a Korean syllable. By utilizing
the proposed SkipTC token, the input sequence for LM becomes very regularly
patterned so that the LM can better learn the linguistic characteristics. Our
experiments show that the proposed approach achieves a lower word error rate
compared to the same LM model without SkipTC. In addition, we are the first to
report the ASR performance for the recently introduced large-scale 7,600h
Korean speech dataset.
Related papers
- RedWhale: An Adapted Korean LLM Through Efficient Continual Pretraining [0.0]
We present RedWhale, a model specifically tailored for Korean language processing.
RedWhale is developed using an efficient continual pretraining approach that includes a comprehensive Korean corpus preprocessing pipeline.
Experimental results demonstrate that RedWhale outperforms other leading models on Korean NLP benchmarks.
arXiv Detail & Related papers (2024-08-21T02:49:41Z) - Pinyin Regularization in Error Correction for Chinese Speech Recognition with Large Language Models [11.287933170894311]
We construct a specialized benchmark dataset aimed at error correction for Chinese ASR with 724K hypotheses-transcription pairs.
We propose a method of Pinyin regularization for prompts, which involves the transcription of Pinyin directly from text hypotheses.
arXiv Detail & Related papers (2024-07-02T03:16:47Z) - From English to More Languages: Parameter-Efficient Model Reprogramming
for Cross-Lingual Speech Recognition [50.93943755401025]
We propose a new parameter-efficient learning framework based on neural model reprogramming for cross-lingual speech recognition.
We design different auxiliary neural architectures focusing on learnable pre-trained feature enhancement.
Our methods outperform existing ASR tuning architectures and their extension with self-supervised losses.
arXiv Detail & Related papers (2023-01-19T02:37:56Z) - Memory Augmented Lookup Dictionary based Language Modeling for Automatic
Speech Recognition [20.926163659469587]
We propose a new memory augmented lookup dictionary based Transformer architecture for LM.
The newly introduced lookup dictionary incorporates rich contextual information in training set, which is vital to correctly predict long-tail tokens.
Our proposed method is proved to outperform the baseline Transformer LM by a great margin on both word/character error rate and tail tokens error rate.
arXiv Detail & Related papers (2022-12-30T22:26:57Z) - Multi-blank Transducers for Speech Recognition [49.6154259349501]
In our proposed method, we introduce additional blank symbols, which consume two or more input frames when emitted.
We refer to the added symbols as big blanks, and the method multi-blank RNN-T.
With experiments on multiple languages and datasets, we show that multi-blank RNN-T methods could bring relative speedups of over +90%/+139%.
arXiv Detail & Related papers (2022-11-04T16:24:46Z) - Design of a novel Korean learning application for efficient
pronunciation correction [2.008880264104061]
Speech recognition, speech-to-text, and speech-to-waveform are the three key systems in the proposed system.
The software will then display the user's phrase and answer, with mispronounced elements highlighted in red.
arXiv Detail & Related papers (2022-05-04T11:19:29Z) - Improving Mandarin End-to-End Speech Recognition with Word N-gram
Language Model [57.92200214957124]
External language models (LMs) are used to improve the recognition performance of end-to-end (E2E) automatic speech recognition (ASR) systems.
We propose a novel decoding algorithm where a word-level lattice is constructed on-the-fly to consider all possible word sequences.
Our method consistently outperforms subword-level LMs, including N-gram LM and neural network LM.
arXiv Detail & Related papers (2022-01-06T10:04:56Z) - KLUE: Korean Language Understanding Evaluation [43.94952771238633]
We introduce Korean Language Understanding Evaluation (KLUE) benchmark.
KLUE is a collection of 8 Korean natural language understanding (NLU) tasks.
We build all of the tasks from scratch from diverse source corpora while respecting copyrights.
arXiv Detail & Related papers (2021-05-20T11:40:30Z) - Non-autoregressive Mandarin-English Code-switching Speech Recognition
with Pinyin Mask-CTC and Word Embedding Regularization [61.749126838659315]
Mandarin-English code-switching (CS) is frequently used among East and Southeast Asian people.
Recent successful non-autoregressive (NAR) ASR models remove the need for left-to-right beam decoding in autoregressive (AR) models.
We propose changing the Mandarin output target of the encoder to Pinyin for faster encoder training, and introduce Pinyin-to-Mandarin decoder to learn contextualized information.
arXiv Detail & Related papers (2021-04-06T03:01:09Z) - How Phonotactics Affect Multilingual and Zero-shot ASR Performance [74.70048598292583]
A Transformer encoder-decoder model has been shown to leverage multilingual data well in IPA transcriptions of languages presented during training.
We replace the encoder-decoder with a hybrid ASR system consisting of a separate AM and LM.
We show that the gain from modeling crosslingual phonotactics is limited, and imposing a too strong model can hurt the zero-shot transfer.
arXiv Detail & Related papers (2020-10-22T23:07:24Z) - Rnn-transducer with language bias for end-to-end Mandarin-English
code-switching speech recognition [58.105818353866354]
We propose an improved recurrent neural network transducer (RNN-T) model with language bias to alleviate the problem.
We use the language identities to bias the model to predict the CS points.
This promotes the model to learn the language identity information directly from transcription, and no additional LID model is needed.
arXiv Detail & Related papers (2020-02-19T12:01:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.