Related papers: Encoder-Decoder Models Can Benefit from Pre-trained Masked Language Models in Grammatical Error Correction

Encoder-Decoder Models Can Benefit from Pre-trained Masked Language Models in Grammatical Error Correction

URL: http://arxiv.org/abs/2005.00987v2
Date: Sun, 31 May 2020 08:01:57 GMT
Title: Encoder-Decoder Models Can Benefit from Pre-trained Masked Language Models in Grammatical Error Correction
Authors: Masahiro Kaneko, Masato Mita, Shun Kiyono, Jun Suzuki, Kentaro Inui
Abstract summary: Previous methods have potential drawbacks when applied to an EncDec model. Our proposed method fine-tune a corpus and then use the output fine-tuned as additional features in the GEC model. The best-performing model state-of-the-art performances on the BEA 2019 and CoNLL-2014 benchmarks.
Score: 54.569707226277735
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper investigates how to effectively incorporate a pre-trained masked language model (MLM), such as BERT, into an encoder-decoder (EncDec) model for grammatical error correction (GEC). The answer to this question is not as straightforward as one might expect because the previous common methods for incorporating a MLM into an EncDec model have potential drawbacks when applied to GEC. For example, the distribution of the inputs to a GEC model can be considerably different (erroneous, clumsy, etc.) from that of the corpora used for pre-training MLMs; however, this issue is not addressed in the previous methods. Our experiments show that our proposed method, where we first fine-tune a MLM with a given GEC corpus and then use the output of the fine-tuned MLM as additional features in the GEC model, maximizes the benefit of the MLM. The best-performing model achieves state-of-the-art performances on the BEA-2019 and CoNLL-2014 benchmarks. Our code is publicly available at: https://github.com/kanekomasahiro/bert-gec.

Related papers

Large Language Diffusion Models [77.02553707673418]
Autoregressive models (ARMs) are widely regarded as the cornerstone of large language models (LLMs) We introduce LLaDA, a diffusion model trained from scratch under the pre-training and supervised fine-tuning paradigm. Across extensive benchmarks, LLaDA demonstrates strong scalability, outperforming our self-constructed ARM baselines.
arXiv Detail & Related papers (2025-02-14T08:23:51Z)
Aligning Large Language Models via Fine-grained Supervision [20.35000061196631]
Pre-trained large-scale language models (LLMs) excel at producing coherent articles, yet their outputs may be untruthful, toxic, or fail to align with user expectations. Current approaches focus on using reinforcement learning with human feedback to improve model alignment. We propose a method to enhance LLM alignment through fine-grained token-level supervision.
arXiv Detail & Related papers (2024-06-04T20:21:45Z)
Which Syntactic Capabilities Are Statistically Learned by Masked Language Models for Code? [51.29970742152668]
We highlight relying on accuracy-based measurements may lead to an overestimation of models' capabilities. To address these issues, we introduce a technique called SyntaxEval in Syntactic Capabilities.
arXiv Detail & Related papers (2024-01-03T02:44:02Z)
Are Pre-trained Language Models Useful for Model Ensemble in Chinese Grammatical Error Correction? [10.302225525539003]
We explore several ensemble strategies based on strong PLMs with four sophisticated single models. The performance does not improve but even gets worse after the PLM-based ensemble.
arXiv Detail & Related papers (2023-05-24T14:18:52Z)
Representation Deficiency in Masked Language Modeling [107.39136254013042]
We propose MAE-LM, which pretrains the Masked Autoencoder architecture with where $tt[MASK]$ tokens are excluded from the encoder. We show that MAE-LM consistently outperforms pretrained models across different pretraining settings and model sizes when fine-tuned on the GLUE and SQuAD benchmarks.
arXiv Detail & Related papers (2023-02-04T01:54:17Z)
Frustratingly Simple Pretraining Alternatives to Masked Language Modeling [10.732163031244651]
Masked language modeling (MLM) is widely used in natural language processing for learning text representations. In this paper, we explore five simple pretraining objectives based on token-level classification tasks as replacements of representations.
arXiv Detail & Related papers (2021-09-04T08:52:37Z)
Exposing the Implicit Energy Networks behind Masked Language Models via Metropolis--Hastings [57.133639209759615]
We interpret sequences as energy-based sequence models and propose two energy parametrizations derivable from traineds. We develop a tractable emph scheme based on the Metropolis-Hastings Monte Carlo algorithm. We validate the effectiveness of the proposed parametrizations by exploring the quality of samples drawn from these energy-based models.
arXiv Detail & Related papers (2021-06-04T22:04:30Z)
Universal Sentence Representation Learning with Conditional Masked Language Model [7.334766841801749]
We present Conditional Masked Language Modeling (M) to effectively learn sentence representations. Our English CMLM model achieves state-of-the-art performance on SentEval. As a fully unsupervised learning method, CMLM can be conveniently extended to a broad range of languages and domains.
arXiv Detail & Related papers (2020-12-28T18:06:37Z)
MPNet: Masked and Permuted Pre-training for Language Understanding [158.63267478638647]
MPNet is a novel pre-training method that inherits the advantages of BERT and XLNet and avoids their limitations. We pretrainNet on a large-scale dataset (over 160GB text corpora) and finetune on a variety of down-streaming tasks. Results show that MPNet outperforms Experimental and PLM by a large margin, and achieves better results on these tasks compared with previous state-of-the-art pre-trained methods.
arXiv Detail & Related papers (2020-04-20T13:54:12Z)
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators [108.3381301768299]
Masked language modeling (MLM) pre-training methods such as BERT corrupt the input by replacing some tokens with [MASK] and then train a model to reconstruct the original tokens. We propose a more sample-efficient pre-training task called replaced token detection.
arXiv Detail & Related papers (2020-03-23T21:17:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.