Sparse Text Generation
- URL: http://arxiv.org/abs/2004.02644v3
- Date: Mon, 5 Oct 2020 11:20:54 GMT
- Title: Sparse Text Generation
- Authors: Pedro Henrique Martins and Zita Marinho and Andr\'e F. T. Martins
- Abstract summary: Current text generators require sampling from a modified softmax, via temperature parameters or ad-hoc truncation techniques, as in top-$k$ or nucleus sampling.
In this paper, we use the recently introduced entmax transformation to train and sample from a sparse language model, avoiding this mismatch.
The result is a text generator with favorable performance in terms of fluency and consistency, fewer repetitions, and n-gram diversity closer to human text.
- Score: 7.747003493657217
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current state-of-the-art text generators build on powerful language models
such as GPT-2, achieving impressive performance. However, to avoid degenerate
text, they require sampling from a modified softmax, via temperature parameters
or ad-hoc truncation techniques, as in top-$k$ or nucleus sampling. This
creates a mismatch between training and testing conditions. In this paper, we
use the recently introduced entmax transformation to train and sample from a
natively sparse language model, avoiding this mismatch. The result is a text
generator with favorable performance in terms of fluency and consistency, fewer
repetitions, and n-gram diversity closer to human text. In order to evaluate
our model, we propose three new metrics for comparing sparse or truncated
distributions: $\epsilon$-perplexity, sparsemax score, and Jensen-Shannon
divergence. Human-evaluated experiments in story completion and dialogue
generation show that entmax sampling leads to more engaging and coherent
stories and conversations.
Related papers
- A Simple yet Efficient Ensemble Approach for AI-generated Text Detection [0.5840089113969194]
Large Language Models (LLMs) have demonstrated remarkable capabilities in generating text that closely resembles human writing.
It is essential to build automated approaches capable of distinguishing between artificially generated text and human-authored text.
We propose a simple yet efficient solution by ensembling predictions from multiple constituent LLMs.
arXiv Detail & Related papers (2023-11-06T13:11:02Z) - Tailoring Language Generation Models under Total Variation Distance [55.89964205594829]
The standard paradigm of neural language generation adopts maximum likelihood estimation (MLE) as the optimizing method.
We develop practical bounds to apply it to language generation.
We introduce the TaiLr objective that balances the tradeoff of estimating TVD.
arXiv Detail & Related papers (2023-02-26T16:32:52Z) - Typical Decoding for Natural Language Generation [76.69397802617064]
We study why high-probability texts can be dull or repetitive.
We show that typical sampling offers competitive performance in terms of quality.
arXiv Detail & Related papers (2022-02-01T18:58:45Z) - HETFORMER: Heterogeneous Transformer with Sparse Attention for Long-Text
Extractive Summarization [57.798070356553936]
HETFORMER is a Transformer-based pre-trained model with multi-granularity sparse attentions for extractive summarization.
Experiments on both single- and multi-document summarization tasks show that HETFORMER achieves state-of-the-art performance in Rouge F1.
arXiv Detail & Related papers (2021-10-12T22:42:31Z) - Fantastically Ordered Prompts and Where to Find Them: Overcoming
Few-Shot Prompt Order Sensitivity [16.893758238773263]
When primed with only a handful of training samples, very large pretrained language models such as GPT-3, have shown competitive results.
We demonstrate that the order in which the samples are provided can be the difference between near state-of-the-art and random guess performance.
We use the generative nature of the language models to construct an artificial development set and based on entropy statistics of the candidate permutations from this set we identify performant prompts.
arXiv Detail & Related papers (2021-04-18T09:29:16Z) - Improving Text Generation with Student-Forcing Optimal Transport [122.11881937642401]
We propose using optimal transport (OT) to match the sequences generated in training and testing modes.
An extension is also proposed to improve the OT learning, based on the structural and contextual information of the text sequences.
The effectiveness of the proposed method is validated on machine translation, text summarization, and text generation tasks.
arXiv Detail & Related papers (2020-10-12T19:42:25Z) - Distributional Discrepancy: A Metric for Unconditional Text Generation [6.6159481812419045]
The purpose of unconditional text generation is to train a model with real sentences, then generate novel sentences of the same quality and diversity as the training data.
A novel metric of distributional discrepancy (DD) is designed to evaluate generators based on the discrepancy between the generated and real training sentences.
DD is significantly better than the three existing metrics for ranking these generative models.
arXiv Detail & Related papers (2020-05-04T05:53:34Z) - POINTER: Constrained Progressive Text Generation via Insertion-based
Generative Pre-training [93.79766670391618]
We present POINTER, a novel insertion-based approach for hard-constrained text generation.
The proposed method operates by progressively inserting new tokens between existing tokens in a parallel manner.
The resulting coarse-to-fine hierarchy makes the generation process intuitive and interpretable.
arXiv Detail & Related papers (2020-05-01T18:11:54Z) - Self-Adversarial Learning with Comparative Discrimination for Text
Generation [111.18614166615968]
We propose a novel self-adversarial learning (SAL) paradigm for improving GANs' performance in text generation.
During training, SAL rewards the generator when its currently generated sentence is found to be better than its previously generated samples.
Experiments on text generation benchmark datasets show that our proposed approach substantially improves both the quality and the diversity.
arXiv Detail & Related papers (2020-01-31T07:50:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.