Encoder-Decoder Models Can Benefit from Pre-trained Masked Language
Models in Grammatical Error Correction
- URL: http://arxiv.org/abs/2005.00987v2
- Date: Sun, 31 May 2020 08:01:57 GMT
- Title: Encoder-Decoder Models Can Benefit from Pre-trained Masked Language
Models in Grammatical Error Correction
- Authors: Masahiro Kaneko, Masato Mita, Shun Kiyono, Jun Suzuki, Kentaro Inui
- Abstract summary: Previous methods have potential drawbacks when applied to an EncDec model.
Our proposed method fine-tune a corpus and then use the output fine-tuned as additional features in the GEC model.
The best-performing model state-of-the-art performances on the BEA 2019 and CoNLL-2014 benchmarks.
- Score: 54.569707226277735
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper investigates how to effectively incorporate a pre-trained masked
language model (MLM), such as BERT, into an encoder-decoder (EncDec) model for
grammatical error correction (GEC). The answer to this question is not as
straightforward as one might expect because the previous common methods for
incorporating a MLM into an EncDec model have potential drawbacks when applied
to GEC. For example, the distribution of the inputs to a GEC model can be
considerably different (erroneous, clumsy, etc.) from that of the corpora used
for pre-training MLMs; however, this issue is not addressed in the previous
methods. Our experiments show that our proposed method, where we first
fine-tune a MLM with a given GEC corpus and then use the output of the
fine-tuned MLM as additional features in the GEC model, maximizes the benefit
of the MLM. The best-performing model achieves state-of-the-art performances on
the BEA-2019 and CoNLL-2014 benchmarks. Our code is publicly available at:
https://github.com/kanekomasahiro/bert-gec.
Related papers
- Aligning Large Language Models via Fine-grained Supervision [20.35000061196631]
Pre-trained large-scale language models (LLMs) excel at producing coherent articles, yet their outputs may be untruthful, toxic, or fail to align with user expectations.
Current approaches focus on using reinforcement learning with human feedback to improve model alignment.
We propose a method to enhance LLM alignment through fine-grained token-level supervision.
arXiv Detail & Related papers (2024-06-04T20:21:45Z) - Which Syntactic Capabilities Are Statistically Learned by Masked
Language Models for Code? [51.29970742152668]
We highlight relying on accuracy-based measurements may lead to an overestimation of models' capabilities.
To address these issues, we introduce a technique called SyntaxEval in Syntactic Capabilities.
arXiv Detail & Related papers (2024-01-03T02:44:02Z) - Are Pre-trained Language Models Useful for Model Ensemble in Chinese
Grammatical Error Correction? [10.302225525539003]
We explore several ensemble strategies based on strong PLMs with four sophisticated single models.
The performance does not improve but even gets worse after the PLM-based ensemble.
arXiv Detail & Related papers (2023-05-24T14:18:52Z) - Representation Deficiency in Masked Language Modeling [107.39136254013042]
We propose MAE-LM, which pretrains the Masked Autoencoder architecture with where $tt[MASK]$ tokens are excluded from the encoder.
We show that MAE-LM consistently outperforms pretrained models across different pretraining settings and model sizes when fine-tuned on the GLUE and SQuAD benchmarks.
arXiv Detail & Related papers (2023-02-04T01:54:17Z) - Frustratingly Simple Pretraining Alternatives to Masked Language
Modeling [10.732163031244651]
Masked language modeling (MLM) is widely used in natural language processing for learning text representations.
In this paper, we explore five simple pretraining objectives based on token-level classification tasks as replacements of representations.
arXiv Detail & Related papers (2021-09-04T08:52:37Z) - Exposing the Implicit Energy Networks behind Masked Language Models via
Metropolis--Hastings [57.133639209759615]
We interpret sequences as energy-based sequence models and propose two energy parametrizations derivable from traineds.
We develop a tractable emph scheme based on the Metropolis-Hastings Monte Carlo algorithm.
We validate the effectiveness of the proposed parametrizations by exploring the quality of samples drawn from these energy-based models.
arXiv Detail & Related papers (2021-06-04T22:04:30Z) - Universal Sentence Representation Learning with Conditional Masked
Language Model [7.334766841801749]
We present Conditional Masked Language Modeling (M) to effectively learn sentence representations.
Our English CMLM model achieves state-of-the-art performance on SentEval.
As a fully unsupervised learning method, CMLM can be conveniently extended to a broad range of languages and domains.
arXiv Detail & Related papers (2020-12-28T18:06:37Z) - MPNet: Masked and Permuted Pre-training for Language Understanding [158.63267478638647]
MPNet is a novel pre-training method that inherits the advantages of BERT and XLNet and avoids their limitations.
We pretrainNet on a large-scale dataset (over 160GB text corpora) and finetune on a variety of down-streaming tasks.
Results show that MPNet outperforms Experimental and PLM by a large margin, and achieves better results on these tasks compared with previous state-of-the-art pre-trained methods.
arXiv Detail & Related papers (2020-04-20T13:54:12Z) - ELECTRA: Pre-training Text Encoders as Discriminators Rather Than
Generators [108.3381301768299]
Masked language modeling (MLM) pre-training methods such as BERT corrupt the input by replacing some tokens with [MASK] and then train a model to reconstruct the original tokens.
We propose a more sample-efficient pre-training task called replaced token detection.
arXiv Detail & Related papers (2020-03-23T21:17:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.