SynGEC: Syntax-Enhanced Grammatical Error Correction with a Tailored
GEC-Oriented Parser
- URL: http://arxiv.org/abs/2210.12484v1
- Date: Sat, 22 Oct 2022 15:54:29 GMT
- Title: SynGEC: Syntax-Enhanced Grammatical Error Correction with a Tailored
GEC-Oriented Parser
- Authors: Yue Zhang and Bo Zhang and Zhenghua Li and Zuyi Bao and Chen Li and
Min Zhang
- Abstract summary: This work proposes a syntax-enhanced grammatical error correction (GEC) approach named SynGEC.
To confront this challenge, we propose to build a tailored GEC-oriented (GOPar) using parallel GEC training data as a pivot.
Experiments on mainstream English and Chinese GEC datasets show that our proposed SynGEC approach consistently and substantially outperforms strong baselines and achieves competitive performance.
- Score: 28.337533657684563
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This work proposes a syntax-enhanced grammatical error correction (GEC)
approach named SynGEC that effectively incorporates dependency syntactic
information into the encoder part of GEC models. The key challenge for this
idea is that off-the-shelf parsers are unreliable when processing ungrammatical
sentences. To confront this challenge, we propose to build a tailored
GEC-oriented parser (GOPar) using parallel GEC training data as a pivot. First,
we design an extended syntax representation scheme that allows us to represent
both grammatical errors and syntax in a unified tree structure. Then, we obtain
parse trees of the source incorrect sentences by projecting trees of the target
correct sentences. Finally, we train GOPar with such projected trees. For GEC,
we employ the graph convolution network to encode source-side syntactic
information produced by GOPar, and fuse them with the outputs of the
Transformer encoder. Experiments on mainstream English and Chinese GEC datasets
show that our proposed SynGEC approach consistently and substantially
outperforms strong baselines and achieves competitive performance. Our code and
data are all publicly available at https://github.com/HillZhang1999/SynGEC.
Related papers
- GEC-DePenD: Non-Autoregressive Grammatical Error Correction with
Decoupled Permutation and Decoding [52.14832976759585]
Grammatical error correction (GEC) is an important NLP task that is usually solved with autoregressive sequence-to-sequence models.
We propose a novel non-autoregressive approach to GEC that decouples the architecture into a permutation network.
We show that the resulting network improves over previously known non-autoregressive methods for GEC.
arXiv Detail & Related papers (2023-11-14T14:24:36Z) - Outline, Then Details: Syntactically Guided Coarse-To-Fine Code
Generation [61.50286000143233]
ChainCoder is a program synthesis language model that generates Python code progressively.
A tailored transformer architecture is leveraged to jointly encode the natural language descriptions and syntactically aligned I/O data samples.
arXiv Detail & Related papers (2023-04-28T01:47:09Z) - CSynGEC: Incorporating Constituent-based Syntax for Grammatical Error
Correction with a Tailored GEC-Oriented Parser [22.942594068051488]
This work considers another mainstream syntax formalism, i.e. constituent-based syntax.
We first propose an extended constituent-based syntax scheme to accommodate errors in ungrammatical sentences.
Then, we automatically obtain constituency trees of ungrammatical sentences to train a GEC-oriented constituency.
arXiv Detail & Related papers (2022-11-15T14:11:39Z) - A Unified Strategy for Multilingual Grammatical Error Correction with
Pre-trained Cross-Lingual Language Model [100.67378875773495]
We propose a generic and language-independent strategy for multilingual Grammatical Error Correction.
Our approach creates diverse parallel GEC data without any language-specific operations.
It achieves the state-of-the-art results on the NLPCC 2018 Task 2 dataset (Chinese) and obtains competitive performance on Falko-Merlin (German) and RULEC-GEC (Russian)
arXiv Detail & Related papers (2022-01-26T02:10:32Z) - GN-Transformer: Fusing Sequence and Graph Representation for Improved
Code Summarization [0.0]
We propose a novel method, GN-Transformer, to learn end-to-end on a fused sequence and graph modality.
The proposed methods achieve state-of-the-art performance in two code summarization datasets and across three automatic code summarization metrics.
arXiv Detail & Related papers (2021-11-17T02:51:37Z) - A Syntax-Guided Grammatical Error Correction Model with Dependency Tree
Correction [83.14159143179269]
Grammatical Error Correction (GEC) is a task of detecting and correcting grammatical errors in sentences.
We propose a syntax-guided GEC model (SG-GEC) which adopts the graph attention mechanism to utilize the syntactic knowledge of dependency trees.
We evaluate our model on public benchmarks of GEC task and it achieves competitive results.
arXiv Detail & Related papers (2021-11-05T07:07:48Z) - LM-Critic: Language Models for Unsupervised Grammatical Error Correction [128.9174409251852]
We show how to leverage a pretrained language model (LM) in defining an LM-Critic, which judges a sentence to be grammatical.
We apply this LM-Critic and BIFI along with a large set of unlabeled sentences to bootstrap realistic ungrammatical / grammatical pairs for training a corrector.
arXiv Detail & Related papers (2021-09-14T17:06:43Z) - Recursive Tree Grammar Autoencoders [3.791857415239352]
We propose a novel autoencoder approach that encodes trees via a bottom-up grammar and decodes trees via a tree grammar.
We show experimentally that our proposed method improves the autoencoding error, training time, and optimization score on four benchmark datasets.
arXiv Detail & Related papers (2020-12-03T17:37:25Z) - Deep Graph Matching and Searching for Semantic Code Retrieval [76.51445515611469]
We propose an end-to-end deep graph matching and searching model based on graph neural networks.
We first represent both natural language query texts and programming language code snippets with the unified graph-structured data.
In particular, DGMS not only captures more structural information for individual query texts or code snippets but also learns the fine-grained similarity between them.
arXiv Detail & Related papers (2020-10-24T14:16:50Z) - Stronger Baselines for Grammatical Error Correction Using Pretrained
Encoder-Decoder Model [24.51571980021599]
We explore the utility of bidirectional and auto-regressive transformers (BART) as a generic pretrained encoder-decoder model for grammatical error correction (GEC)
We find that monolingual and multilingual BART models achieve high performance in GEC, with one of the results being comparable to the current strong results in English GEC.
arXiv Detail & Related papers (2020-05-24T22:13:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.