Improving Pre-trained Language Models with Syntactic Dependency
Prediction Task for Chinese Semantic Error Recognition
- URL: http://arxiv.org/abs/2204.07464v1
- Date: Fri, 15 Apr 2022 13:55:32 GMT
- Title: Improving Pre-trained Language Models with Syntactic Dependency
Prediction Task for Chinese Semantic Error Recognition
- Authors: Bo Sun, Baoxin Wang, Wanxiang Che, Dayong Wu, Zhigang Chen, Ting Liu
- Abstract summary: Existing Chinese text error detection mainly focuses on spelling and simple grammatical errors.
Chinese semantic errors are understudied and more complex that humans cannot easily recognize.
- Score: 52.55136323341319
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing Chinese text error detection mainly focuses on spelling and simple
grammatical errors. These errors have been studied extensively and are
relatively simple for humans. On the contrary, Chinese semantic errors are
understudied and more complex that humans cannot easily recognize. The task of
this paper is Chinese Semantic Error Recognition (CSER), a binary
classification task to determine whether a sentence contains semantic errors.
The current research has no effective method to solve this task. In this paper,
we inherit the model structure of BERT and design several syntax-related
pre-training tasks so that the model can learn syntactic knowledge. Our
pre-training tasks consider both the directionality of the dependency structure
and the diversity of the dependency relationship. Due to the lack of a
published dataset for CSER, we build a high-quality dataset for CSER for the
first time named Corpus of Chinese Linguistic Semantic Acceptability (CoCLSA).
The experimental results on the CoCLSA show that our methods outperform
universal pre-trained models and syntax-infused models.
Related papers
- Chinese Spelling Correction as Rephrasing Language Model [63.65217759957206]
We study Chinese Spelling Correction (CSC), which aims to detect and correct the potential spelling errors in a given sentence.
Current state-of-the-art methods regard CSC as a sequence tagging task and fine-tune BERT-based models on sentence pairs.
We propose Rephrasing Language Model (ReLM), where the model is trained to rephrase the entire sentence by infilling additional slots, instead of character-to-character tagging.
arXiv Detail & Related papers (2023-08-17T06:04:28Z) - CSED: A Chinese Semantic Error Diagnosis Corpus [52.92010408053424]
We study the complicated problem of Chinese Semantic Error Diagnosis (CSED), which lacks relevant datasets.
The study of semantic errors is important because they are very common and may lead to syntactic irregularities or even problems of comprehension.
This paper proposes syntax-aware models to specifically adapt to the CSED task.
arXiv Detail & Related papers (2023-05-09T05:33:31Z) - Towards preserving word order importance through Forced Invalidation [80.33036864442182]
We show that pre-trained language models are insensitive to word order.
We propose Forced Invalidation to help preserve the importance of word order.
Our experiments demonstrate that Forced Invalidation significantly improves the sensitivity of the models to word order.
arXiv Detail & Related papers (2023-04-11T13:42:10Z) - uChecker: Masked Pretrained Language Models as Unsupervised Chinese
Spelling Checkers [23.343006562849126]
We propose a framework named textbfuChecker to conduct unsupervised spelling error detection and correction.
Masked pretrained language models such as BERT are introduced as the backbone model.
Benefiting from the various and flexible MASKing operations, we propose a Confusionset-guided masking strategy to fine-train the masked language model.
arXiv Detail & Related papers (2022-09-15T05:57:12Z) - The Past Mistake is the Future Wisdom: Error-driven Contrastive
Probability Optimization for Chinese Spell Checking [32.8563506271794]
Chinese Spell Checking (CSC) aims to detect and correct Chinese spelling errors.
Pre-trained language models (PLMs) promote the progress of CSC task.
We propose an Error-driven COntrastive Probability Optimization framework for CSC task.
arXiv Detail & Related papers (2022-03-02T09:58:56Z) - A Syntax-Guided Grammatical Error Correction Model with Dependency Tree
Correction [83.14159143179269]
Grammatical Error Correction (GEC) is a task of detecting and correcting grammatical errors in sentences.
We propose a syntax-guided GEC model (SG-GEC) which adopts the graph attention mechanism to utilize the syntactic knowledge of dependency trees.
We evaluate our model on public benchmarks of GEC task and it achieves competitive results.
arXiv Detail & Related papers (2021-11-05T07:07:48Z) - On the Robustness of Language Encoders against Grammatical Errors [66.05648604987479]
We collect real grammatical errors from non-native speakers and conduct adversarial attacks to simulate these errors on clean text data.
Results confirm that the performance of all tested models is affected but the degree of impact varies.
arXiv Detail & Related papers (2020-05-12T11:01:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.