Straight to the Gradient: Learning to Use Novel Tokens for Neural Text
Generation
- URL: http://arxiv.org/abs/2106.07207v1
- Date: Mon, 14 Jun 2021 07:46:30 GMT
- Title: Straight to the Gradient: Learning to Use Novel Tokens for Neural Text
Generation
- Authors: Xiang Lin, Simeng Han, Shafiq Joty
- Abstract summary: We introduce ScaleGrad, a modification straight to the gradient of the loss function, to remedy the degeneration issue of the standard MLE objective.
Empirical results show the effectiveness of our method not only in open-ended generation, but also in directed generation tasks.
- Score: 4.866431869728018
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Advanced large-scale neural language models have led to significant success
in many language generation tasks. However, the most commonly used training
objective, Maximum Likelihood Estimation (MLE), has been shown problematic,
where the trained model prefers using dull and repetitive phrases. In this
work, we introduce ScaleGrad, a modification straight to the gradient of the
loss function, to remedy the degeneration issue of the standard MLE objective.
By directly maneuvering the gradient information, ScaleGrad makes the model
learn to use novel tokens. Empirical results show the effectiveness of our
method not only in open-ended generation, but also in directed generation
tasks. With the simplicity in architecture, our method can serve as a general
training objective that is applicable to most of the neural text generation
tasks.
Related papers
- Generate to Understand for Representation [3.5325087487696463]
GUR is a pretraining framework that combines language modeling and contrastive learning objectives in a single training step.
GUR achieves impressive results without any labeled training data, outperforming all other pretrained baselines as a retriever at the recall benchmark in a zero-shot setting.
arXiv Detail & Related papers (2023-06-14T06:00:18Z) - Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types.
Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z) - Robust Preference Learning for Storytelling via Contrastive
Reinforcement Learning [53.92465205531759]
Controlled automated story generation seeks to generate natural language stories satisfying constraints from natural language critiques or preferences.
We train a contrastive bi-encoder model to align stories with human critiques, building a general purpose preference model.
We further fine-tune the contrastive reward model using a prompt-learning technique to increase story generation robustness.
arXiv Detail & Related papers (2022-10-14T13:21:33Z) - Improving Pre-trained Language Model Fine-tuning with Noise Stability
Regularization [94.4409074435894]
We propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR)
Specifically, we propose to inject the standard Gaussian noise and regularize hidden representations of the fine-tuned model.
We demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART.
arXiv Detail & Related papers (2022-06-12T04:42:49Z) - Few-shot Prompting Towards Controllable Response Generation [49.479958672988566]
We first explored the combination of prompting and reinforcement learning (RL) to steer models' generation without accessing any of the models' parameters.
We apply multi-task learning to make the model learn to generalize to new tasks better.
Experiment results show that our proposed method can successfully control several state-of-the-art (SOTA) dialogue models without accessing their parameters.
arXiv Detail & Related papers (2022-06-08T14:48:06Z) - Memory Efficient Continual Learning for Neural Text Classification [10.70710638820641]
We devise a method to perform text classification using pre-trained models on a sequence of classification tasks provided in sequence.
We empirically demonstrate that our method requires significantly less model parameters compared to other state of the art methods.
While our method suffers little forgetting, it retains a predictive performance on-par with state of the art but less memory efficient methods.
arXiv Detail & Related papers (2022-03-09T10:57:59Z) - Grad2Task: Improved Few-shot Text Classification Using Gradients for
Task Representation [24.488427641442694]
We propose a novel conditional neural process-based approach for few-shot text classification.
Our key idea is to represent each task using gradient information from a base model.
Our approach outperforms traditional fine-tuning, sequential transfer learning, and state-of-the-art meta learning approaches.
arXiv Detail & Related papers (2022-01-27T15:29:30Z) - LT-LM: a novel non-autoregressive language model for single-shot lattice
rescoring [55.16665077221941]
We propose a novel rescoring approach, which processes the entire lattice in a single call to the model.
The key feature of our rescoring policy is a novel non-autoregressive Lattice Transformer Language Model (LT-LM)
arXiv Detail & Related papers (2021-04-06T14:06:07Z) - Neural Language Modeling for Contextualized Temporal Graph Generation [49.21890450444187]
This paper presents the first study on using large-scale pre-trained language models for automated generation of an event-level temporal graph for a document.
arXiv Detail & Related papers (2020-10-20T07:08:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.