CSynGEC: Incorporating Constituent-based Syntax for Grammatical Error
Correction with a Tailored GEC-Oriented Parser
- URL: http://arxiv.org/abs/2211.08158v1
- Date: Tue, 15 Nov 2022 14:11:39 GMT
- Title: CSynGEC: Incorporating Constituent-based Syntax for Grammatical Error
Correction with a Tailored GEC-Oriented Parser
- Authors: Yue Zhang, Zhenghua Li
- Abstract summary: This work considers another mainstream syntax formalism, i.e. constituent-based syntax.
We first propose an extended constituent-based syntax scheme to accommodate errors in ungrammatical sentences.
Then, we automatically obtain constituency trees of ungrammatical sentences to train a GEC-oriented constituency.
- Score: 22.942594068051488
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recently, Zhang et al. (2022) propose a syntax-aware grammatical error
correction (GEC) approach, named SynGEC, showing that incorporating tailored
dependency-based syntax of the input sentence is quite beneficial to GEC. This
work considers another mainstream syntax formalism, i.e., constituent-based
syntax. By drawing on the successful experience of SynGEC, we first propose an
extended constituent-based syntax scheme to accommodate errors in ungrammatical
sentences. Then, we automatically obtain constituency trees of ungrammatical
sentences to train a GEC-oriented constituency parser by using parallel GEC
data as a pivot. For syntax encoding, we employ the graph convolutional network
(GCN). Experimental results show that our method, named CSynGEC, yields
substantial improvements over strong baselines. Moreover, we investigate the
integration of constituent-based and dependency-based syntax for GEC in two
ways: 1) intra-model combination, which means using separate GCNs to encode
both kinds of syntax for decoding in a single model; 2)inter-model combination,
which means gathering and selecting edits predicted by different models to
achieve final corrections. We find that the former method improves recall over
using one standalone syntax formalism while the latter improves precision, and
both lead to better F0.5 values.
Related papers
- LM-Combiner: A Contextual Rewriting Model for Chinese Grammatical Error Correction [49.0746090186582]
Over-correction is a critical problem in Chinese grammatical error correction (CGEC) task.
Recent work using model ensemble methods can effectively mitigate over-correction and improve the precision of the GEC system.
We propose the LM-Combiner, a rewriting model that can directly modify the over-correction of GEC system outputs without a model ensemble.
arXiv Detail & Related papers (2024-03-26T06:12:21Z) - GEC-DePenD: Non-Autoregressive Grammatical Error Correction with
Decoupled Permutation and Decoding [52.14832976759585]
Grammatical error correction (GEC) is an important NLP task that is usually solved with autoregressive sequence-to-sequence models.
We propose a novel non-autoregressive approach to GEC that decouples the architecture into a permutation network.
We show that the resulting network improves over previously known non-autoregressive methods for GEC.
arXiv Detail & Related papers (2023-11-14T14:24:36Z) - Improving Seq2Seq Grammatical Error Correction via Decoding
Interventions [40.52259641181596]
We propose a unified decoding intervention framework that employs an external critic to assess the appropriateness of the token to be generated incrementally.
We discover and investigate two types of critics: a pre-trained left-to-right language model critic and an incremental target-side grammatical error detector critic.
Our framework consistently outperforms strong baselines and achieves results competitive with state-of-the-art methods.
arXiv Detail & Related papers (2023-10-23T03:36:37Z) - SynGEC: Syntax-Enhanced Grammatical Error Correction with a Tailored
GEC-Oriented Parser [28.337533657684563]
This work proposes a syntax-enhanced grammatical error correction (GEC) approach named SynGEC.
To confront this challenge, we propose to build a tailored GEC-oriented (GOPar) using parallel GEC training data as a pivot.
Experiments on mainstream English and Chinese GEC datasets show that our proposed SynGEC approach consistently and substantially outperforms strong baselines and achieves competitive performance.
arXiv Detail & Related papers (2022-10-22T15:54:29Z) - A Unified Strategy for Multilingual Grammatical Error Correction with
Pre-trained Cross-Lingual Language Model [100.67378875773495]
We propose a generic and language-independent strategy for multilingual Grammatical Error Correction.
Our approach creates diverse parallel GEC data without any language-specific operations.
It achieves the state-of-the-art results on the NLPCC 2018 Task 2 dataset (Chinese) and obtains competitive performance on Falko-Merlin (German) and RULEC-GEC (Russian)
arXiv Detail & Related papers (2022-01-26T02:10:32Z) - GN-Transformer: Fusing Sequence and Graph Representation for Improved
Code Summarization [0.0]
We propose a novel method, GN-Transformer, to learn end-to-end on a fused sequence and graph modality.
The proposed methods achieve state-of-the-art performance in two code summarization datasets and across three automatic code summarization metrics.
arXiv Detail & Related papers (2021-11-17T02:51:37Z) - A Syntax-Guided Grammatical Error Correction Model with Dependency Tree
Correction [83.14159143179269]
Grammatical Error Correction (GEC) is a task of detecting and correcting grammatical errors in sentences.
We propose a syntax-guided GEC model (SG-GEC) which adopts the graph attention mechanism to utilize the syntactic knowledge of dependency trees.
We evaluate our model on public benchmarks of GEC task and it achieves competitive results.
arXiv Detail & Related papers (2021-11-05T07:07:48Z) - LM-Critic: Language Models for Unsupervised Grammatical Error Correction [128.9174409251852]
We show how to leverage a pretrained language model (LM) in defining an LM-Critic, which judges a sentence to be grammatical.
We apply this LM-Critic and BIFI along with a large set of unlabeled sentences to bootstrap realistic ungrammatical / grammatical pairs for training a corrector.
arXiv Detail & Related papers (2021-09-14T17:06:43Z) - Improving the Efficiency of Grammatical Error Correction with Erroneous
Span Detection and Correction [106.63733511672721]
We propose a novel language-independent approach to improve the efficiency for Grammatical Error Correction (GEC) by dividing the task into two subtasks: Erroneous Span Detection ( ESD) and Erroneous Span Correction (ESC)
ESD identifies grammatically incorrect text spans with an efficient sequence tagging model. ESC leverages a seq2seq model to take the sentence with annotated erroneous spans as input and only outputs the corrected text for these spans.
Experiments show our approach performs comparably to conventional seq2seq approaches in both English and Chinese GEC benchmarks with less than 50% time cost for inference.
arXiv Detail & Related papers (2020-10-07T08:29:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.