GEC-DePenD: Non-Autoregressive Grammatical Error Correction with
Decoupled Permutation and Decoding
- URL: http://arxiv.org/abs/2311.08191v1
- Date: Tue, 14 Nov 2023 14:24:36 GMT
- Title: GEC-DePenD: Non-Autoregressive Grammatical Error Correction with
Decoupled Permutation and Decoding
- Authors: Konstantin Yakovlev, Alexander Podolskiy, Andrey Bout, Sergey
Nikolenko, Irina Piontkovskaya
- Abstract summary: Grammatical error correction (GEC) is an important NLP task that is usually solved with autoregressive sequence-to-sequence models.
We propose a novel non-autoregressive approach to GEC that decouples the architecture into a permutation network.
We show that the resulting network improves over previously known non-autoregressive methods for GEC.
- Score: 52.14832976759585
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Grammatical error correction (GEC) is an important NLP task that is currently
usually solved with autoregressive sequence-to-sequence models. However,
approaches of this class are inherently slow due to one-by-one token
generation, so non-autoregressive alternatives are needed. In this work, we
propose a novel non-autoregressive approach to GEC that decouples the
architecture into a permutation network that outputs a self-attention weight
matrix that can be used in beam search to find the best permutation of input
tokens (with auxiliary {ins} tokens) and a decoder network based on a
step-unrolled denoising autoencoder that fills in specific tokens. This allows
us to find the token permutation after only one forward pass of the permutation
network, avoiding autoregressive constructions. We show that the resulting
network improves over previously known non-autoregressive methods for GEC and
reaches the level of autoregressive methods that do not use language-specific
synthetic data generation methods. Our results are supported by a comprehensive
experimental validation on the ConLL-2014 and Write&Improve+LOCNESS datasets
and an extensive ablation study that supports our architectural and algorithmic
choices.
Related papers
- SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking [60.109453252858806]
A maximum-likelihood (MLE) objective does not match a downstream use-case of autoregressively generating high-quality sequences.
We formulate sequence generation as an imitation learning (IL) problem.
This allows us to minimize a variety of divergences between the distribution of sequences generated by an autoregressive model and sequences from a dataset.
Our resulting method, SequenceMatch, can be implemented without adversarial training or architectural changes.
arXiv Detail & Related papers (2023-06-08T17:59:58Z) - Permutation-Invariant Set Autoencoders with Fixed-Size Embeddings for
Multi-Agent Learning [7.22614468437919]
We introduce a Permutation-Invariant Set Autoencoder (PISA)
PISA produces encodings with significantly lower reconstruction error than existing baselines.
We demonstrate its usefulness in a multi-agent application.
arXiv Detail & Related papers (2023-02-24T18:59:13Z) - Autoregressive Search Engines: Generating Substrings as Document
Identifiers [53.0729058170278]
Autoregressive language models are emerging as the de-facto standard for generating answers.
Previous work has explored ways to partition the search space into hierarchical structures.
In this work we propose an alternative that doesn't force any structure in the search space: using all ngrams in a passage as its possible identifiers.
arXiv Detail & Related papers (2022-04-22T10:45:01Z) - Step-unrolled Denoising Autoencoders for Text Generation [17.015573262373742]
We propose a new generative model of text, Step-unrolled Denoising Autoencoder (SUNDAE)
SUNDAE is repeatedly applied on a sequence of tokens, starting from random inputs and improving them each time until convergence.
We present a simple new improvement operator that converges in fewer iterations than diffusion methods.
arXiv Detail & Related papers (2021-12-13T16:00:33Z) - Discovering Non-monotonic Autoregressive Orderings with Variational
Inference [67.27561153666211]
We develop an unsupervised parallelizable learner that discovers high-quality generation orders purely from training data.
We implement the encoder as a Transformer with non-causal attention that outputs permutations in one forward pass.
Empirical results in language modeling tasks demonstrate that our method is context-aware and discovers orderings that are competitive with or even better than fixed orders.
arXiv Detail & Related papers (2021-10-27T16:08:09Z) - Highly Parallel Autoregressive Entity Linking with Discriminative
Correction [51.947280241185]
We propose a very efficient approach that parallelizes autoregressive linking across all potential mentions.
Our model is >70 times faster and more accurate than the previous generative method.
arXiv Detail & Related papers (2021-09-08T17:28:26Z) - Don't Take It Literally: An Edit-Invariant Sequence Loss for Text
Generation [109.46348908829697]
We propose a novel Edit-Invariant Sequence Loss (EISL), which computes the matching loss of a target n-gram with all n-grams in the generated sequence.
We conduct experiments on three tasks: machine translation with noisy target sequences, unsupervised text style transfer, and non-autoregressive machine translation.
arXiv Detail & Related papers (2021-06-29T03:59:21Z) - SparseGAN: Sparse Generative Adversarial Network for Text Generation [8.634962333084724]
We propose a SparseGAN that generates semantic-interpretable, but sparse sentence representations as inputs to the discriminator.
With such semantic-rich representations, we not only reduce unnecessary noises for efficient adversarial training, but also make the entire training process fully differentiable.
arXiv Detail & Related papers (2021-03-22T04:44:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.