Related papers: Disentangled Phonetic Representation for Chinese Spelling Correction

Disentangled Phonetic Representation for Chinese Spelling Correction

URL: http://arxiv.org/abs/2305.14783v1
Date: Wed, 24 May 2023 06:39:12 GMT
Title: Disentangled Phonetic Representation for Chinese Spelling Correction
Authors: Zihong Liang, Xiaojun Quan, Qifan Wang
Abstract summary: Chinese Spelling Correction aims to detect and correct erroneous characters in Chinese texts. Efforts have been made to introduce phonetic information in this task, but they typically merge phonetic representations with character representations. We propose to disentangle the two types of features to allow for direct interaction between textual and phonetic information.
Score: 25.674770525359236
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Chinese Spelling Correction (CSC) aims to detect and correct erroneous characters in Chinese texts. Although efforts have been made to introduce phonetic information (Hanyu Pinyin) in this task, they typically merge phonetic representations with character representations, which tends to weaken the representation effect of normal texts. In this work, we propose to disentangle the two types of features to allow for direct interaction between textual and phonetic information. To learn useful phonetic representations, we introduce a pinyin-to-character objective to ask the model to predict the correct characters based solely on phonetic information, where a separation mask is imposed to disable attention from phonetic input to text. To avoid overfitting the phonetics, we further design a self-distillation module to ensure that semantic information plays a major role in the prediction. Extensive experiments on three CSC benchmarks demonstrate the superiority of our method in using phonetic information.

Related papers

Languages in Multilingual Speech Foundation Models Align Both Phonetically and Semantically [58.019484208091534]
Cross-lingual alignment in pretrained language models (LMs) has enabled efficient transfer in text-based LMs.<n>It remains an open question whether findings and methods from text-based cross-lingual alignment apply to speech.
arXiv Detail & Related papers (2025-05-26T07:21:20Z)
Enhancing Spoken Discourse Modeling in Language Models Using Gestural Cues [56.36041287155606]
We investigate whether the joint modeling of gestures using human motion sequences and language can improve spoken discourse modeling. To integrate gestures into language models, we first encode 3D human motion sequences into discrete gesture tokens using a VQ-VAE. Results show that incorporating gestures enhances marker prediction accuracy across the three tasks.
arXiv Detail & Related papers (2025-03-05T13:10:07Z)
Grammar Induction from Visual, Speech and Text [91.98797120799227]
This work introduces a novel visual-audio-text grammar induction task (textbfVAT-GI) Inspired by the fact that language grammar exists beyond the texts, we argue that the text has not to be the predominant modality in grammar induction. We propose a visual-audio-text inside-outside autoencoder (textbfVaTiora) framework, which leverages rich modal-specific and complementary features for effective grammar parsing.
arXiv Detail & Related papers (2024-10-01T02:24:18Z)
Large Language Model Should Understand Pinyin for Chinese ASR Error Correction [31.13523648668466]
We propose Pinyin-enhanced GEC to improve Chinese ASR error correction. Our approach only utilizes synthetic errors for training and employs the one-best hypothesis during inference. Experiments on the Aishell-1 and the Common Voice datasets demonstrate that our approach consistently outperforms GEC with text-only input.
arXiv Detail & Related papers (2024-09-20T06:50:56Z)
Identifying Speakers and Addressees of Quotations in Novels with Prompt Learning [5.691280935924612]
We propose prompt learning-based methods for speaker and addressee identification based on fine-tuned pre-trained models. Experiments on both Chinese and English datasets show the effectiveness of the proposed methods.
arXiv Detail & Related papers (2024-08-18T12:19:18Z)
Orientation-Independent Chinese Text Recognition in Scene Images [61.34060587461462]
We take the first attempt to extract orientation-independent visual features by disentangling content and orientation information of text images. Specifically, we introduce a Character Image Reconstruction Network (CIRN) to recover corresponding printed character images with disentangled content and orientation information.
arXiv Detail & Related papers (2023-09-03T05:30:21Z)
Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information [68.89000132126536]
This work proposes to use inter-utterance linguistic information to improve the performance of prosodic structure prediction (PSP) Our method achieves better F1 scores in predicting prosodic word (PW), prosodic phrase (PPH) and intonational phrase (IPH)
arXiv Detail & Related papers (2023-08-31T09:19:15Z)
Enhancing Cross-lingual Transfer via Phonemic Transcription Integration [57.109031654219294]
PhoneXL is a framework incorporating phonemic transcriptions as an additional linguistic modality for cross-lingual transfer. Our pilot study reveals phonemic transcription provides essential information beyond the orthography to enhance cross-lingual transfer.
arXiv Detail & Related papers (2023-07-10T06:17:33Z)
Language-Guided Audio-Visual Source Separation via Trimodal Consistency [64.0580750128049]
A key challenge in this task is learning to associate the linguistic description of a sound-emitting object to its visual features and the corresponding components of the audio waveform. We adapt off-the-shelf vision-language foundation models to provide pseudo-target supervision via two novel loss functions. We demonstrate the effectiveness of our self-supervised approach on three audio-visual separation datasets.
arXiv Detail & Related papers (2023-03-28T22:45:40Z)
Good Neighbors Are All You Need for Chinese Grapheme-to-Phoneme Conversion [1.5020330976600735]
Most Chinese Grapheme-to-Phoneme (G2P) systems employ a three-stage framework that first transforms input sequences into character embeddings, obtains linguistic information using language models, and then predicts the phonemes based on global context. We propose the Reinforcer that provides strong inductive bias for language models by emphasizing the phonological information between neighboring characters to help disambiguate pronunciations.
arXiv Detail & Related papers (2023-03-14T09:15:51Z)
Text-Aware End-to-end Mispronunciation Detection and Diagnosis [17.286013739453796]
Mispronunciation detection and diagnosis (MDD) technology is a key component of computer-assisted pronunciation training system (CAPT) In this paper, we present a gating strategy that assigns more importance to the relevant audio features while suppressing irrelevant text information.
arXiv Detail & Related papers (2022-06-15T04:08:10Z)
SHUOWEN-JIEZI: Linguistically Informed Tokenizers For Chinese Language Model Pretraining [48.880840711568425]
We study the influences of three main factors on the Chinese tokenization for pretrained language models. We propose three kinds of tokenizers: SHUOWEN (meaning Talk Word), the pronunciation-based tokenizers; 2) JIEZI (meaning Solve Character), the glyph-based tokenizers. We find that SHUOWEN and JIEZI tokenizers can generally outperform conventional single-character tokenizers.
arXiv Detail & Related papers (2021-06-01T11:20:02Z)
Read, Listen, and See: Leveraging Multimodal Information Helps Chinese Spell Checking [20.74049189959078]
We propose a Chinese spell checker called ReaLiSe, by directly leveraging the multimodal information of the Chinese characters. The ReaLiSe tackles model the CSC task by (1) capturing the semantic, phonetic and graphic information of the input characters, and (2) mixing the information in these modalities to predict the correct output. Experiments on the SIGHAN benchmarks show that the proposed model outperforms strong baselines by a large margin.
arXiv Detail & Related papers (2021-05-26T02:38:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.