On the Robustness of Language Encoders against Grammatical Errors
- URL: http://arxiv.org/abs/2005.05683v1
- Date: Tue, 12 May 2020 11:01:44 GMT
- Title: On the Robustness of Language Encoders against Grammatical Errors
- Authors: Fan Yin, Quanyu Long, Tao Meng, Kai-Wei Chang
- Abstract summary: We collect real grammatical errors from non-native speakers and conduct adversarial attacks to simulate these errors on clean text data.
Results confirm that the performance of all tested models is affected but the degree of impact varies.
- Score: 66.05648604987479
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We conduct a thorough study to diagnose the behaviors of pre-trained language
encoders (ELMo, BERT, and RoBERTa) when confronted with natural grammatical
errors. Specifically, we collect real grammatical errors from non-native
speakers and conduct adversarial attacks to simulate these errors on clean text
data. We use this approach to facilitate debugging models on downstream
applications. Results confirm that the performance of all tested models is
affected but the degree of impact varies. To interpret model behaviors, we
further design a linguistic acceptability task to reveal their abilities in
identifying ungrammatical sentences and the position of errors. We find that
fixed contextual encoders with a simple classifier trained on the prediction of
sentence correctness are able to locate error positions. We also design a cloze
test for BERT and discover that BERT captures the interaction between errors
and specific tokens in context. Our results shed light on understanding the
robustness and behaviors of language encoders against grammatical errors.
Related papers
- A Comprehensive Approach to Misspelling Correction with BERT and Levenshtein Distance [1.7000578646860536]
Spelling mistakes, among the most prevalent writing errors, are frequently encountered due to various factors.
This research aims to identify and rectify diverse spelling errors in text using neural networks.
arXiv Detail & Related papers (2024-07-24T16:07:11Z) - Understanding and Mitigating Classification Errors Through Interpretable
Token Patterns [58.91023283103762]
Characterizing errors in easily interpretable terms gives insight into whether a classifier is prone to making systematic errors.
We propose to discover those patterns of tokens that distinguish correct and erroneous predictions.
We show that our method, Premise, performs well in practice.
arXiv Detail & Related papers (2023-11-18T00:24:26Z) - Byte-Level Grammatical Error Correction Using Synthetic and Curated
Corpora [0.0]
Grammatical error correction (GEC) is the task of correcting typos, spelling, punctuation and grammatical issues in text.
We show that a byte-level model enables higher correction quality than a subword approach.
arXiv Detail & Related papers (2023-05-29T06:35:40Z) - Towards Fine-Grained Information: Identifying the Type and Location of
Translation Errors [80.22825549235556]
Existing approaches can not synchronously consider error position and type.
We build an FG-TED model to predict the textbf addition and textbfomission errors.
Experiments show that our model can identify both error type and position concurrently, and gives state-of-the-art results.
arXiv Detail & Related papers (2023-02-17T16:20:33Z) - Probing for targeted syntactic knowledge through grammatical error
detection [13.653209309144593]
We propose grammatical error detection as a diagnostic probe to evaluate pre-trained English language models.
We leverage public annotated training data from both English second language learners and Wikipedia edits.
We find that masked language models linearly encode information relevant to the detection of SVA errors, while the autoregressive models perform on par with our baseline.
arXiv Detail & Related papers (2022-10-28T16:01:25Z) - uChecker: Masked Pretrained Language Models as Unsupervised Chinese
Spelling Checkers [23.343006562849126]
We propose a framework named textbfuChecker to conduct unsupervised spelling error detection and correction.
Masked pretrained language models such as BERT are introduced as the backbone model.
Benefiting from the various and flexible MASKing operations, we propose a Confusionset-guided masking strategy to fine-train the masked language model.
arXiv Detail & Related papers (2022-09-15T05:57:12Z) - Improving Pre-trained Language Models with Syntactic Dependency
Prediction Task for Chinese Semantic Error Recognition [52.55136323341319]
Existing Chinese text error detection mainly focuses on spelling and simple grammatical errors.
Chinese semantic errors are understudied and more complex that humans cannot easily recognize.
arXiv Detail & Related papers (2022-04-15T13:55:32Z) - A Syntax-Guided Grammatical Error Correction Model with Dependency Tree
Correction [83.14159143179269]
Grammatical Error Correction (GEC) is a task of detecting and correcting grammatical errors in sentences.
We propose a syntax-guided GEC model (SG-GEC) which adopts the graph attention mechanism to utilize the syntactic knowledge of dependency trees.
We evaluate our model on public benchmarks of GEC task and it achieves competitive results.
arXiv Detail & Related papers (2021-11-05T07:07:48Z) - Towards Minimal Supervision BERT-based Grammar Error Correction [81.90356787324481]
We try to incorporate contextual information from pre-trained language model to leverage annotation and benefit multilingual scenarios.
Results show strong potential of Bidirectional Representations from Transformers (BERT) in grammatical error correction task.
arXiv Detail & Related papers (2020-01-10T15:45:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.