Disentangled Phonetic Representation for Chinese Spelling Correction
- URL: http://arxiv.org/abs/2305.14783v1
- Date: Wed, 24 May 2023 06:39:12 GMT
- Title: Disentangled Phonetic Representation for Chinese Spelling Correction
- Authors: Zihong Liang, Xiaojun Quan, Qifan Wang
- Abstract summary: Chinese Spelling Correction aims to detect and correct erroneous characters in Chinese texts.
Efforts have been made to introduce phonetic information in this task, but they typically merge phonetic representations with character representations.
We propose to disentangle the two types of features to allow for direct interaction between textual and phonetic information.
- Score: 25.674770525359236
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Chinese Spelling Correction (CSC) aims to detect and correct erroneous
characters in Chinese texts. Although efforts have been made to introduce
phonetic information (Hanyu Pinyin) in this task, they typically merge phonetic
representations with character representations, which tends to weaken the
representation effect of normal texts. In this work, we propose to disentangle
the two types of features to allow for direct interaction between textual and
phonetic information. To learn useful phonetic representations, we introduce a
pinyin-to-character objective to ask the model to predict the correct
characters based solely on phonetic information, where a separation mask is
imposed to disable attention from phonetic input to text. To avoid overfitting
the phonetics, we further design a self-distillation module to ensure that
semantic information plays a major role in the prediction. Extensive
experiments on three CSC benchmarks demonstrate the superiority of our method
in using phonetic information.
Related papers
- Grammar Induction from Visual, Speech and Text [91.98797120799227]
This work introduces a novel visual-audio-text grammar induction task (textbfVAT-GI)
Inspired by the fact that language grammar exists beyond the texts, we argue that the text has not to be the predominant modality in grammar induction.
We propose a visual-audio-text inside-outside autoencoder (textbfVaTiora) framework, which leverages rich modal-specific and complementary features for effective grammar parsing.
arXiv Detail & Related papers (2024-10-01T02:24:18Z) - Large Language Model Should Understand Pinyin for Chinese ASR Error Correction [31.13523648668466]
We propose Pinyin-enhanced GEC to improve Chinese ASR error correction.
Our approach only utilizes synthetic errors for training and employs the one-best hypothesis during inference.
Experiments on the Aishell-1 and the Common Voice datasets demonstrate that our approach consistently outperforms GEC with text-only input.
arXiv Detail & Related papers (2024-09-20T06:50:56Z) - Identifying Speakers and Addressees of Quotations in Novels with Prompt Learning [5.691280935924612]
We propose prompt learning-based methods for speaker and addressee identification based on fine-tuned pre-trained models.
Experiments on both Chinese and English datasets show the effectiveness of the proposed methods.
arXiv Detail & Related papers (2024-08-18T12:19:18Z) - Orientation-Independent Chinese Text Recognition in Scene Images [61.34060587461462]
We take the first attempt to extract orientation-independent visual features by disentangling content and orientation information of text images.
Specifically, we introduce a Character Image Reconstruction Network (CIRN) to recover corresponding printed character images with disentangled content and orientation information.
arXiv Detail & Related papers (2023-09-03T05:30:21Z) - Improving Mandarin Prosodic Structure Prediction with Multi-level
Contextual Information [68.89000132126536]
This work proposes to use inter-utterance linguistic information to improve the performance of prosodic structure prediction (PSP)
Our method achieves better F1 scores in predicting prosodic word (PW), prosodic phrase (PPH) and intonational phrase (IPH)
arXiv Detail & Related papers (2023-08-31T09:19:15Z) - Enhancing Cross-lingual Transfer via Phonemic Transcription Integration [57.109031654219294]
PhoneXL is a framework incorporating phonemic transcriptions as an additional linguistic modality for cross-lingual transfer.
Our pilot study reveals phonemic transcription provides essential information beyond the orthography to enhance cross-lingual transfer.
arXiv Detail & Related papers (2023-07-10T06:17:33Z) - Language-Guided Audio-Visual Source Separation via Trimodal Consistency [64.0580750128049]
A key challenge in this task is learning to associate the linguistic description of a sound-emitting object to its visual features and the corresponding components of the audio waveform.
We adapt off-the-shelf vision-language foundation models to provide pseudo-target supervision via two novel loss functions.
We demonstrate the effectiveness of our self-supervised approach on three audio-visual separation datasets.
arXiv Detail & Related papers (2023-03-28T22:45:40Z) - Good Neighbors Are All You Need for Chinese Grapheme-to-Phoneme
Conversion [1.5020330976600735]
Most Chinese Grapheme-to-Phoneme (G2P) systems employ a three-stage framework that first transforms input sequences into character embeddings, obtains linguistic information using language models, and then predicts the phonemes based on global context.
We propose the Reinforcer that provides strong inductive bias for language models by emphasizing the phonological information between neighboring characters to help disambiguate pronunciations.
arXiv Detail & Related papers (2023-03-14T09:15:51Z) - Text-Aware End-to-end Mispronunciation Detection and Diagnosis [17.286013739453796]
Mispronunciation detection and diagnosis (MDD) technology is a key component of computer-assisted pronunciation training system (CAPT)
In this paper, we present a gating strategy that assigns more importance to the relevant audio features while suppressing irrelevant text information.
arXiv Detail & Related papers (2022-06-15T04:08:10Z) - SHUOWEN-JIEZI: Linguistically Informed Tokenizers For Chinese Language
Model Pretraining [48.880840711568425]
We study the influences of three main factors on the Chinese tokenization for pretrained language models.
We propose three kinds of tokenizers: SHUOWEN (meaning Talk Word), the pronunciation-based tokenizers; 2) JIEZI (meaning Solve Character), the glyph-based tokenizers.
We find that SHUOWEN and JIEZI tokenizers can generally outperform conventional single-character tokenizers.
arXiv Detail & Related papers (2021-06-01T11:20:02Z) - Read, Listen, and See: Leveraging Multimodal Information Helps Chinese
Spell Checking [20.74049189959078]
We propose a Chinese spell checker called ReaLiSe, by directly leveraging the multimodal information of the Chinese characters.
The ReaLiSe tackles model the CSC task by (1) capturing the semantic, phonetic and graphic information of the input characters, and (2) mixing the information in these modalities to predict the correct output.
Experiments on the SIGHAN benchmarks show that the proposed model outperforms strong baselines by a large margin.
arXiv Detail & Related papers (2021-05-26T02:38:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.