Related papers: Decoupling recognition and transcription in Mandarin ASR

Decoupling recognition and transcription in Mandarin ASR

URL: http://arxiv.org/abs/2108.01129v1
Date: Mon, 2 Aug 2021 19:09:41 GMT
Title: Decoupling recognition and transcription in Mandarin ASR
Authors: Jiahong Yuan, Xingyu Cai, Dongji Gao, Renjie Zheng, Liang Huang, Kenneth Church
Abstract summary: We propose factoring audio -> Hanzi into two sub-tasks: (1) audio -> Pinyin and (2) Pinyin -> Hanzi, where Pinyin is a system of phonetic transcription of standard Chinese. Factoring the audio -> Hanzi task in this way achieves 3.9% CER (character error rate) on the Aishell-1 corpus, the best result reported on this dataset so far.
Score: 21.36547395115413
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Much of the recent literature on automatic speech recognition (ASR) is taking an end-to-end approach. Unlike English where the writing system is closely related to sound, Chinese characters (Hanzi) represent meaning, not sound. We propose factoring audio -> Hanzi into two sub-tasks: (1) audio -> Pinyin and (2) Pinyin -> Hanzi, where Pinyin is a system of phonetic transcription of standard Chinese. Factoring the audio -> Hanzi task in this way achieves 3.9% CER (character error rate) on the Aishell-1 corpus, the best result reported on this dataset so far.

Related papers

Large Language Model Should Understand Pinyin for Chinese ASR Error Correction [31.13523648668466]
We propose Pinyin-enhanced GEC to improve Chinese ASR error correction. Our approach only utilizes synthetic errors for training and employs the one-best hypothesis during inference. Experiments on the Aishell-1 and the Common Voice datasets demonstrate that our approach consistently outperforms GEC with text-only input.
arXiv Detail & Related papers (2024-09-20T06:50:56Z)
Enhancing Cross-lingual Transfer via Phonemic Transcription Integration [57.109031654219294]
PhoneXL is a framework incorporating phonemic transcriptions as an additional linguistic modality for cross-lingual transfer. Our pilot study reveals phonemic transcription provides essential information beyond the orthography to enhance cross-lingual transfer.
arXiv Detail & Related papers (2023-07-10T06:17:33Z)
Shuo Wen Jie Zi: Rethinking Dictionaries and Glyphs for Chinese Language Pre-training [50.100992353488174]
We introduce CDBERT, a new learning paradigm that enhances the semantics understanding ability of the Chinese PLMs with dictionary knowledge and structure of Chinese characters. We name the two core modules of CDBERT as Shuowen and Jiezi, where Shuowen refers to the process of retrieving the most appropriate meaning from Chinese dictionaries. Our paradigm demonstrates consistent improvements on previous Chinese PLMs across all tasks.
arXiv Detail & Related papers (2023-05-30T05:48:36Z)
Disentangled Phonetic Representation for Chinese Spelling Correction [25.674770525359236]
Chinese Spelling Correction aims to detect and correct erroneous characters in Chinese texts. Efforts have been made to introduce phonetic information in this task, but they typically merge phonetic representations with character representations. We propose to disentangle the two types of features to allow for direct interaction between textual and phonetic information.
arXiv Detail & Related papers (2023-05-24T06:39:12Z)
READIN: A Chinese Multi-Task Benchmark with Realistic and Diverse Input Noises [87.70001456418504]
We construct READIN: a Chinese multi-task benchmark with REalistic And Diverse Input Noises. READIN contains four diverse tasks and requests annotators to re-enter the original test data with two commonly used Chinese input methods: Pinyin input and speech input. We experiment with a series of strong pretrained language models as well as robust training methods, we find that these models often suffer significant performance drops on READIN.
arXiv Detail & Related papers (2023-02-14T20:14:39Z)
Exploring and Adapting Chinese GPT to Pinyin Input Method [48.15790080309427]
We make the first exploration to leverage Chinese GPT for pinyin input method. A frozen GPT achieves state-of-the-art performance on perfect pinyin. However, the performance drops dramatically when the input includes abbreviated pinyin.
arXiv Detail & Related papers (2022-03-01T06:05:07Z)
Dual-Decoder Transformer For end-to-end Mandarin Chinese Speech Recognition with Pinyin and Character [15.999657143705045]
Pinyin and character as writing and spelling systems respectively are mutual promotion in the Mandarin Chinese language. We propose a novel Mandarin Chinese ASR model with dual-decoder Transformer according to the characteristics of pinyin transcripts and character transcripts. The results on the test sets of AISHELL-1 dataset show that the proposed Speech-Pinyin-Character-Interaction (S PCI) model without a language model achieves 9.85% character error rate (CER) on the test set.
arXiv Detail & Related papers (2022-01-26T07:59:03Z)
SHUOWEN-JIEZI: Linguistically Informed Tokenizers For Chinese Language Model Pretraining [48.880840711568425]
We study the influences of three main factors on the Chinese tokenization for pretrained language models. We propose three kinds of tokenizers: SHUOWEN (meaning Talk Word), the pronunciation-based tokenizers; 2) JIEZI (meaning Solve Character), the glyph-based tokenizers. We find that SHUOWEN and JIEZI tokenizers can generally outperform conventional single-character tokenizers.
arXiv Detail & Related papers (2021-06-01T11:20:02Z)
Phoneme Recognition through Fine Tuning of Phonetic Representations: a Case Study on Luhya Language Varieties [77.2347265289855]
We focus on phoneme recognition using Allosaurus, a method for multilingual recognition based on phonetic annotation. To evaluate in a challenging real-world scenario, we curate phone recognition datasets for Bukusu and Saamia, two varieties of the Luhya language cluster of western Kenya and eastern Uganda. We find that fine-tuning of Allosaurus, even with just 100 utterances, leads to significant improvements in phone error rates.
arXiv Detail & Related papers (2021-04-04T15:07:55Z)
Learning to Pronounce Chinese Without a Pronunciation Dictionary [10.622817647136667]
We demonstrate a program that learns to pronounce Chinese text in Mandarin, without a pronunciation dictionary. From non-parallel streams of Chinese characters and Chinese pinyin syllables, it establishes a many-to-many mapping between characters and pronunciations. Its token-level character-to-syllable accuracy is 89%, which significantly exceeds the 22% accuracy of prior work.
arXiv Detail & Related papers (2020-10-09T18:03:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.