Large Language Model Should Understand Pinyin for Chinese ASR Error Correction
- URL: http://arxiv.org/abs/2409.13262v1
- Date: Fri, 20 Sep 2024 06:50:56 GMT
- Title: Large Language Model Should Understand Pinyin for Chinese ASR Error Correction
- Authors: Yuang Li, Xiaosong Qiao, Xiaofeng Zhao, Huan Zhao, Wei Tang, Min Zhang, Hao Yang,
- Abstract summary: We propose Pinyin-enhanced GEC to improve Chinese ASR error correction.
Our approach only utilizes synthetic errors for training and employs the one-best hypothesis during inference.
Experiments on the Aishell-1 and the Common Voice datasets demonstrate that our approach consistently outperforms GEC with text-only input.
- Score: 31.13523648668466
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models can enhance automatic speech recognition systems through generative error correction. In this paper, we propose Pinyin-enhanced GEC, which leverages Pinyi, the phonetic representation of Mandarin Chinese, as supplementary information to improve Chinese ASR error correction. Our approach only utilizes synthetic errors for training and employs the one-best hypothesis during inference. Additionally, we introduce a multitask training approach involving conversion tasks between Pinyin and text to align their feature spaces. Experiments on the Aishell-1 and the Common Voice datasets demonstrate that our approach consistently outperforms GEC with text-only input. More importantly, we provide intuitive explanations for the effectiveness of PY-GEC and multitask training from two aspects: 1) increased attention weight on Pinyin features; and 2) aligned feature space between Pinyin and text hidden states.
Related papers
- Pinyin Regularization in Error Correction for Chinese Speech Recognition with Large Language Models [11.287933170894311]
We construct a specialized benchmark dataset aimed at error correction for Chinese ASR with 724K hypotheses-transcription pairs.
We propose a method of Pinyin regularization for prompts, which involves the transcription of Pinyin directly from text hypotheses.
arXiv Detail & Related papers (2024-07-02T03:16:47Z) - Exploring the Usage of Chinese Pinyin in Pretraining [28.875174965608554]
Pinyin is essential in many scenarios, such as error correction and fault tolerance for ASR-introduced errors.
In this work, we explore various ways of using pinyin in pretraining models and propose a new pretraining method called PmBERT.
arXiv Detail & Related papers (2023-10-08T01:26:44Z) - Enhancing Cross-lingual Transfer via Phonemic Transcription Integration [57.109031654219294]
PhoneXL is a framework incorporating phonemic transcriptions as an additional linguistic modality for cross-lingual transfer.
Our pilot study reveals phonemic transcription provides essential information beyond the orthography to enhance cross-lingual transfer.
arXiv Detail & Related papers (2023-07-10T06:17:33Z) - Disentangled Phonetic Representation for Chinese Spelling Correction [25.674770525359236]
Chinese Spelling Correction aims to detect and correct erroneous characters in Chinese texts.
Efforts have been made to introduce phonetic information in this task, but they typically merge phonetic representations with character representations.
We propose to disentangle the two types of features to allow for direct interaction between textual and phonetic information.
arXiv Detail & Related papers (2023-05-24T06:39:12Z) - READIN: A Chinese Multi-Task Benchmark with Realistic and Diverse Input
Noises [87.70001456418504]
We construct READIN: a Chinese multi-task benchmark with REalistic And Diverse Input Noises.
READIN contains four diverse tasks and requests annotators to re-enter the original test data with two commonly used Chinese input methods: Pinyin input and speech input.
We experiment with a series of strong pretrained language models as well as robust training methods, we find that these models often suffer significant performance drops on READIN.
arXiv Detail & Related papers (2023-02-14T20:14:39Z) - Improving Chinese Spelling Check by Character Pronunciation Prediction:
The Effects of Adaptivity and Granularity [76.20568599642799]
Chinese spelling check (CSC) is a fundamental NLP task that detects and corrects spelling errors in Chinese texts.
In this paper, we consider introducing an auxiliary task of Chinese pronunciation prediction ( CPP) to improve CSC.
We propose SCOPE which builds on top of a shared encoder two parallel decoders, one for the primary CSC task and the other for a fine-grained auxiliary CPP task.
arXiv Detail & Related papers (2022-10-20T03:42:35Z) - Exploring and Adapting Chinese GPT to Pinyin Input Method [48.15790080309427]
We make the first exploration to leverage Chinese GPT for pinyin input method.
A frozen GPT achieves state-of-the-art performance on perfect pinyin.
However, the performance drops dramatically when the input includes abbreviated pinyin.
arXiv Detail & Related papers (2022-03-01T06:05:07Z) - Dual-Decoder Transformer For end-to-end Mandarin Chinese Speech
Recognition with Pinyin and Character [15.999657143705045]
Pinyin and character as writing and spelling systems respectively are mutual promotion in the Mandarin Chinese language.
We propose a novel Mandarin Chinese ASR model with dual-decoder Transformer according to the characteristics of pinyin transcripts and character transcripts.
The results on the test sets of AISHELL-1 dataset show that the proposed Speech-Pinyin-Character-Interaction (S PCI) model without a language model achieves 9.85% character error rate (CER) on the test set.
arXiv Detail & Related papers (2022-01-26T07:59:03Z) - SHUOWEN-JIEZI: Linguistically Informed Tokenizers For Chinese Language
Model Pretraining [48.880840711568425]
We study the influences of three main factors on the Chinese tokenization for pretrained language models.
We propose three kinds of tokenizers: SHUOWEN (meaning Talk Word), the pronunciation-based tokenizers; 2) JIEZI (meaning Solve Character), the glyph-based tokenizers.
We find that SHUOWEN and JIEZI tokenizers can generally outperform conventional single-character tokenizers.
arXiv Detail & Related papers (2021-06-01T11:20:02Z) - Non-autoregressive Mandarin-English Code-switching Speech Recognition
with Pinyin Mask-CTC and Word Embedding Regularization [61.749126838659315]
Mandarin-English code-switching (CS) is frequently used among East and Southeast Asian people.
Recent successful non-autoregressive (NAR) ASR models remove the need for left-to-right beam decoding in autoregressive (AR) models.
We propose changing the Mandarin output target of the encoder to Pinyin for faster encoder training, and introduce Pinyin-to-Mandarin decoder to learn contextualized information.
arXiv Detail & Related papers (2021-04-06T03:01:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.