Momentum Calibration for Text Generation
- URL: http://arxiv.org/abs/2212.04257v1
- Date: Thu, 8 Dec 2022 13:12:10 GMT
- Title: Momentum Calibration for Text Generation
- Authors: Xingxing Zhang, Yiran Liu, Xun Wang, Pengcheng He, Yang Yu, Si-Qing
Chen, Wayne Xiong, Furu Wei
- Abstract summary: We propose MoCa (bf Momentum bf Calibration) for text generation.
MoCa is an online method that dynamically generates slowly evolving (but consistent) samples using a momentum moving average generator with beam search.
- Score: 86.58432361938806
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The input and output of most text generation tasks can be transformed to two
sequences of tokens and they can be modeled using sequence-to-sequence learning
modeling tools such as Transformers. These models are usually trained by
maximizing the likelihood the output text sequence and assumes the input
sequence and all gold preceding tokens are given during training, while during
inference the model suffers from the exposure bias problem (i.e., it only has
access to its previously predicted tokens rather gold tokens during beam
search). In this paper, we propose MoCa ({\bf Mo}mentum {\bf Ca}libration) for
text generation. MoCa is an online method that dynamically generates slowly
evolving (but consistent) samples using a momentum moving average generator
with beam search and MoCa learns to align its model scores of these samples
with their actual qualities. Experiments on four text generation datasets
(i.e., CNN/DailyMail, XSum, SAMSum and Gigaword) show MoCa consistently
improves strong pre-trained transformers using vanilla fine-tuning and we
achieve the state-of-the-art results on CNN/DailyMail and SAMSum datasets.
Related papers
- Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines [74.42485647685272]
We focus on Generative Masked Language Models (GMLMs)
We train a model to fit conditional probabilities of the data distribution via masking, which are subsequently used as inputs to a Markov Chain to draw samples from the model.
We adapt the T5 model for iteratively-refined parallel decoding, achieving 2-3x speedup in machine translation with minimal sacrifice in quality.
arXiv Detail & Related papers (2024-07-22T18:00:00Z) - MiLe Loss: a New Loss for Mitigating the Bias of Learning Difficulties in Generative Language Models [40.992566245706996]
We propose a MiLe Loss function for mitigating the bias of learning difficulties with tokens.
We train generative language models at different scales of 468M, 1.2B, and 6.7B parameters.
Experiments reveal that models incorporating the proposed MiLe Loss can gain consistent performance improvement on downstream benchmarks.
arXiv Detail & Related papers (2023-10-30T13:33:21Z) - Is BERT a Cross-Disciplinary Knowledge Learner? A Surprising Finding of
Pre-trained Models' Transferability [74.11825654535895]
We investigate whether the power of the models pre-trained on text data, such as BERT, can be transferred to general token sequence classification applications.
We find that even on non-text data, the models pre-trained on text converge faster than the randomly models.
arXiv Detail & Related papers (2021-03-12T09:19:14Z) - Topical Language Generation using Transformers [4.795530213347874]
This paper presents a novel approach for Topical Language Generation (TLG) by combining a pre-trained LM with topic modeling information.
We extend our model by introducing new parameters and functions to influence the quantity of the topical features presented in the generated text.
Our experimental results demonstrate that our model outperforms the state-of-the-art results on coherency, diversity, and fluency while being faster in decoding.
arXiv Detail & Related papers (2021-03-11T03:45:24Z) - COCO-LM: Correcting and Contrasting Text Sequences for Language Model
Pretraining [59.169836983883656]
COCO-LM is a new self-supervised learning framework that pretrains Language Models by COrrecting challenging errors and COntrasting text sequences.
COCO-LM employs an auxiliary language model to mask-and-predict tokens in original text sequences.
Our analyses reveal that COCO-LM's advantages come from its challenging training signals, more contextualized token representations, and regularized sequence representations.
arXiv Detail & Related papers (2021-02-16T22:24:29Z) - Improving Text Generation with Student-Forcing Optimal Transport [122.11881937642401]
We propose using optimal transport (OT) to match the sequences generated in training and testing modes.
An extension is also proposed to improve the OT learning, based on the structural and contextual information of the text sequences.
The effectiveness of the proposed method is validated on machine translation, text summarization, and text generation tasks.
arXiv Detail & Related papers (2020-10-12T19:42:25Z) - POINTER: Constrained Progressive Text Generation via Insertion-based
Generative Pre-training [93.79766670391618]
We present POINTER, a novel insertion-based approach for hard-constrained text generation.
The proposed method operates by progressively inserting new tokens between existing tokens in a parallel manner.
The resulting coarse-to-fine hierarchy makes the generation process intuitive and interpretable.
arXiv Detail & Related papers (2020-05-01T18:11:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.