Integrated Semantic and Phonetic Post-correction for Chinese Speech
Recognition
- URL: http://arxiv.org/abs/2111.08400v1
- Date: Tue, 16 Nov 2021 11:55:27 GMT
- Title: Integrated Semantic and Phonetic Post-correction for Chinese Speech
Recognition
- Authors: Yi-Chang Chen, Chun-Yen Cheng, Chien-An Chen, Ming-Chieh Sung and
Yi-Ren Yeh
- Abstract summary: We propose a novel approach to collectively exploit the contextualized representation and the phonetic information between the error and its replacing candidates to alleviate the error rate of Chinese ASR.
Our experiment results on real world speech recognition showed that our proposed method has evidently lower than the baseline model.
- Score: 1.2914521751805657
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Due to the recent advances of natural language processing, several works have
applied the pre-trained masked language model (MLM) of BERT to the
post-correction of speech recognition. However, existing pre-trained models
only consider the semantic correction while the phonetic features of words is
neglected. The semantic-only post-correction will consequently decrease the
performance since homophonic errors are fairly common in Chinese ASR. In this
paper, we proposed a novel approach to collectively exploit the contextualized
representation and the phonetic information between the error and its replacing
candidates to alleviate the error rate of Chinese ASR. Our experiment results
on real world speech recognition datasets showed that our proposed method has
evidently lower CER than the baseline model, which utilized a pre-trained BERT
MLM as the corrector.
Related papers
- Pinyin Regularization in Error Correction for Chinese Speech Recognition with Large Language Models [11.287933170894311]
We construct a specialized benchmark dataset aimed at error correction for Chinese ASR with 724K hypotheses-transcription pairs.
We propose a method of Pinyin regularization for prompts, which involves the transcription of Pinyin directly from text hypotheses.
arXiv Detail & Related papers (2024-07-02T03:16:47Z) - Large Language Models are Efficient Learners of Noise-Robust Speech
Recognition [65.95847272465124]
Recent advances in large language models (LLMs) have promoted generative error correction (GER) for automatic speech recognition (ASR)
In this work, we extend the benchmark to noisy conditions and investigate if we can teach LLMs to perform denoising for GER.
Experiments on various latest LLMs demonstrate our approach achieves a new breakthrough with up to 53.9% correction improvement in terms of word error rate.
arXiv Detail & Related papers (2024-01-19T01:29:27Z) - Back Transcription as a Method for Evaluating Robustness of Natural
Language Understanding Models to Speech Recognition Errors [0.4681661603096333]
In a spoken dialogue system, an NLU model is preceded by a speech recognition system that can deteriorate the performance of natural language understanding.
This paper proposes a method for investigating the impact of speech recognition errors on the performance of natural language understanding models.
arXiv Detail & Related papers (2023-10-25T13:07:07Z) - Generative error correction for code-switching speech recognition using
large language models [49.06203730433107]
Code-switching (CS) speech refers to the phenomenon of mixing two or more languages within the same sentence.
We propose to leverage large language models (LLMs) and lists of hypotheses generated by an ASR to address the CS problem.
arXiv Detail & Related papers (2023-10-17T14:49:48Z) - Whispering LLaMA: A Cross-Modal Generative Error Correction Framework
for Speech Recognition [10.62060432965311]
We introduce a new cross-modal fusion technique designed for generative error correction in automatic speech recognition (ASR)
Our methodology leverages both acoustic information and external linguistic representations to generate accurate speech transcription contexts.
arXiv Detail & Related papers (2023-10-10T09:04:33Z) - HyPoradise: An Open Baseline for Generative Speech Recognition with
Large Language Models [81.56455625624041]
We introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction.
The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses.
LLMs with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list.
arXiv Detail & Related papers (2023-09-27T14:44:10Z) - Chinese Spelling Correction as Rephrasing Language Model [63.65217759957206]
We study Chinese Spelling Correction (CSC), which aims to detect and correct the potential spelling errors in a given sentence.
Current state-of-the-art methods regard CSC as a sequence tagging task and fine-tune BERT-based models on sentence pairs.
We propose Rephrasing Language Model (ReLM), where the model is trained to rephrase the entire sentence by infilling additional slots, instead of character-to-character tagging.
arXiv Detail & Related papers (2023-08-17T06:04:28Z) - Boosting Chinese ASR Error Correction with Dynamic Error Scaling
Mechanism [27.09416337926635]
Current mainstream models often struggle with effectively utilizing word-level features and phonetic information.
This paper introduces a novel approach that incorporates a dynamic error scaling mechanism to detect and correct phonetically erroneous text.
arXiv Detail & Related papers (2023-08-07T09:19:59Z) - Improving Pre-trained Language Models with Syntactic Dependency
Prediction Task for Chinese Semantic Error Recognition [52.55136323341319]
Existing Chinese text error detection mainly focuses on spelling and simple grammatical errors.
Chinese semantic errors are understudied and more complex that humans cannot easily recognize.
arXiv Detail & Related papers (2022-04-15T13:55:32Z) - Joint Contextual Modeling for ASR Correction and Language Understanding [60.230013453699975]
We propose multi-task neural approaches to perform contextual language correction on ASR outputs jointly with language understanding (LU)
We show that the error rates of off the shelf ASR and following LU systems can be reduced significantly by 14% relative with joint models trained using small amounts of in-domain data.
arXiv Detail & Related papers (2020-01-28T22:09:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.