Related papers: Multi-head Sequence Tagging Model for Grammatical Error Correction

Multi-head Sequence Tagging Model for Grammatical Error Correction

URL: http://arxiv.org/abs/2410.16473v1
Date: Mon, 21 Oct 2024 20:01:06 GMT
Title: Multi-head Sequence Tagging Model for Grammatical Error Correction
Authors: Kamal Al-Sabahi, Kang Yang, Wangwang Liu, Guanyu Jiang, Xian Li, Ming Yang,
Abstract summary: Grammatical Error Correction (GEC) problem is a mapping between a source sequence and a target one. Current sequence tagging approaches still have some issues handling a broad range of grammatical errors just by being laser-focused on one task. We propose a novel multi-head and multi-task learning model to effectively utilize training data and harness the information from related task training signals.
Score: 31.538895931875565
License: http://creativecommons.org/licenses/by/4.0/
Abstract: To solve the Grammatical Error Correction (GEC) problem , a mapping between a source sequence and a target one is needed, where the two differ only on few spans. For this reason, the attention has been shifted to the non-autoregressive or sequence tagging models. In which, the GEC has been simplified from Seq2Seq to labeling the input tokens with edit commands chosen from a large edit space. Due to this large number of classes and the limitation of the available datasets, the current sequence tagging approaches still have some issues handling a broad range of grammatical errors just by being laser-focused on one single task. To this end, we simplified the GEC further by dividing it into seven related subtasks: Insertion, Deletion, Merge, Substitution, Transformation, Detection, and Correction, with Correction being our primary focus. A distinct classification head is dedicated to each of these subtasks. the novel multi-head and multi-task learning model is proposed to effectively utilize training data and harness the information from related task training signals. To mitigate the limited number of available training samples, a new denoising autoencoder is used to generate a new synthetic dataset to be used for pretraining. Additionally, a new character-level transformation is proposed to enhance the sequence-to-edit function and improve the model's vocabulary coverage. Our single/ensemble model achieves an F0.5 of 74.4/77.0, and 68.6/69.1 on BEA-19 (test) and CoNLL-14 (test) respectively. Moreover, evaluated on JFLEG test set, the GLEU scores are 61.6 and 61.7 for the single and ensemble models, respectively. It mostly outperforms recently published state-of-the-art results by a considerable margin.

Related papers

Efficient Grammatical Error Correction Via Multi-Task Training and Optimized Training Schedule [55.08778142798106]
We propose auxiliary tasks that exploit the alignment between the original and corrected sentences. We formulate each task as a sequence-to-sequence problem and perform multi-task training. We find that the order of datasets used for training and even individual instances within a dataset may have important effects on the final performance.
arXiv Detail & Related papers (2023-11-20T14:50:12Z)
SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking [60.109453252858806]
A maximum-likelihood (MLE) objective does not match a downstream use-case of autoregressively generating high-quality sequences. We formulate sequence generation as an imitation learning (IL) problem. This allows us to minimize a variety of divergences between the distribution of sequences generated by an autoregressive model and sequences from a dataset. Our resulting method, SequenceMatch, can be implemented without adversarial training or architectural changes.
arXiv Detail & Related papers (2023-06-08T17:59:58Z)
Reducing Sequence Length by Predicting Edit Operations with Large Language Models [50.66922361766939]
This paper proposes predicting edit spans for the source text for local sequence transduction tasks. We apply instruction tuning for Large Language Models on the supervision data of edit spans. Experiments show that the proposed method achieves comparable performance to the baseline in four tasks.
arXiv Detail & Related papers (2023-05-19T17:51:05Z)
Thutmose Tagger: Single-pass neural model for Inverse Text Normalization [76.87664008338317]
Inverse text normalization (ITN) is an essential post-processing step in automatic speech recognition. We present a dataset preparation method based on the granular alignment of ITN examples. One-to-one correspondence between tags and input words improves the interpretability of the model's predictions.
arXiv Detail & Related papers (2022-07-29T20:39:02Z)
Sequence-to-Action: Grammatical Error Correction with Action Guided Sequence Generation [21.886973310718457]
We propose a novel Sequence-to-Action(S2A) module for Grammatical Error Correction. The S2A module jointly takes the source and target sentences as input, and is able to automatically generate a token-level action sequence. Our model consistently outperforms the seq2seq baselines, while being able to significantly alleviate the over-correction problem.
arXiv Detail & Related papers (2022-05-22T17:47:06Z)
Few-Shot Learning with Siamese Networks and Label Tuning [5.006086647446482]
We show that with proper pre-training, Siamese Networks that embed texts and labels offer a competitive alternative. We introduce label tuning, a simple and computationally efficient approach that allows to adapt the models in a few-shot setup by only changing the label embeddings.
arXiv Detail & Related papers (2022-03-28T11:16:46Z)
Tail-to-Tail Non-Autoregressive Sequence Prediction for Chinese Grammatical Error Correction [49.25830718574892]
We present a new framework named Tail-to-Tail (textbfTtT) non-autoregressive sequence prediction. Considering that most tokens are correct and can be conveyed directly from source to target, and the error positions can be estimated and corrected. Experimental results on standard datasets, especially on the variable-length datasets, demonstrate the effectiveness of TtT in terms of sentence-level Accuracy, Precision, Recall, and F1-Measure.
arXiv Detail & Related papers (2021-06-03T05:56:57Z)
Transfer Learning for Sequence Generation: from Single-source to Multi-source [50.34044254589968]
We propose a two-stage finetuning method to alleviate the pretrain-finetune discrepancy and introduce a novel MSG model with a fine encoder to learn better representations in MSG tasks. Our approach achieves new state-of-the-art results on the WMT17 APE task and multi-source translation task using the WMT14 test set.
arXiv Detail & Related papers (2021-05-31T09:12:38Z)
Grammatical Error Correction as GAN-like Sequence Labeling [45.19453732703053]
We propose a GAN-like sequence labeling model, which consists of a grammatical error detector as a discriminator and a grammatical error labeler with Gumbel-Softmax sampling as a generator. Our results on several evaluation benchmarks demonstrate that our proposed approach is effective and improves the previous state-of-the-art baseline.
arXiv Detail & Related papers (2021-05-29T04:39:40Z)
Meta-Regularization by Enforcing Mutual-Exclusiveness [0.8057006406834467]
We propose a regularization technique for meta-learning models that gives the model designer more control over the information flow during meta-training. Our proposed regularization function shows an accuracy boost of $sim$ $36%$ on the Omniglot dataset.
arXiv Detail & Related papers (2021-01-24T22:57:19Z)
Pre-training Is (Almost) All You Need: An Application to Commonsense Reasoning [61.32992639292889]
Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks. We introduce a new scoring method that casts a plausibility ranking task in a full-text format. We show that our method provides a much more stable training phase across random restarts.
arXiv Detail & Related papers (2020-04-29T10:54:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.