Momentum Decoding: Open-ended Text Generation As Graph Exploration
- URL: http://arxiv.org/abs/2212.02175v1
- Date: Mon, 5 Dec 2022 11:16:47 GMT
- Title: Momentum Decoding: Open-ended Text Generation As Graph Exploration
- Authors: Tian Lan and Yixuan Su and Shuhang Liu and Heyan Huang and Xian-Ling
Mao
- Abstract summary: Open-ended text generation with autoregressive language models (LMs) is one of the core tasks in natural language processing.
We formulate open-ended text generation from a new perspective, i.e., we view it as an exploration process within a directed graph.
We propose a novel decoding method -- textitmomentum decoding -- which encourages the LM to explore new nodes outside the current graph.
- Score: 49.812280360794894
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Open-ended text generation with autoregressive language models (LMs) is one
of the core tasks in natural language processing. However, maximization-based
decoding methods (e.g., greedy/beam search) often lead to the degeneration
problem, i.e., the generated text is unnatural and contains undesirable
repetitions. Existing solutions to this problem either introduce randomness
prone to incoherence or require a look-ahead mechanism that demands extra
computational overhead. In this study, we formulate open-ended text generation
from a new perspective, i.e., we view it as an exploration process within a
directed graph. Thereby, we understand the phenomenon of degeneration as
circular loops within the directed graph. Based on our formulation, we propose
a novel decoding method -- \textit{momentum decoding} -- which encourages the
LM to \textit{greedily} explore new nodes outside the current graph. Meanwhile,
it also allows the LM to return to the existing nodes with a momentum
downgraded by a pre-defined resistance function. We extensively test our
approach on three benchmarks from different domains through automatic and human
evaluations. The results show that momentum decoding performs comparably with
the current state of the art while enjoying notably improved inference speed
and computation FLOPs. Furthermore, we conduct a detailed analysis to reveal
the merits and inner workings of our approach. Our codes and other related
resources are publicly available at
https://github.com/gmftbyGMFTBY/MomentumDecoding.
Related papers
- Explaining Graph Neural Networks with Large Language Models: A Counterfactual Perspective for Molecular Property Prediction [41.39277277686706]
Graph Counterfactual Explanation (GCE) has emerged as a promising approach to improve GNN transparency.
We propose a novel GCE method, LLM-GCE, to unleash the power of large language models (LLMs) in explaining GNNs for molecular property prediction.
arXiv Detail & Related papers (2024-10-19T17:34:36Z) - The CLRS-Text Algorithmic Reasoning Language Benchmark [48.45201665463275]
CLRS-Text is a textual version of the CLRS benchmark.
CLRS-Text is capable of procedurally generating trace data for thirty diverse, challenging algorithmic tasks.
We fine-tune and evaluate various LMs as generalist executors on this benchmark.
arXiv Detail & Related papers (2024-06-06T16:29:25Z) - SimTeG: A Frustratingly Simple Approach Improves Textual Graph Learning [131.04781590452308]
We present SimTeG, a frustratingly Simple approach for Textual Graph learning.
We first perform supervised parameter-efficient fine-tuning (PEFT) on a pre-trained LM on the downstream task.
We then generate node embeddings using the last hidden states of finetuned LM.
arXiv Detail & Related papers (2023-08-03T07:00:04Z) - Look-back Decoding for Open-Ended Text Generation [62.53302138266465]
We propose Look-back, an improved decoding algorithm that tracks the distribution distance between current and historical decoding steps.
Look-back can automatically predict potential repetitive phrase and topic drift, and remove tokens that may cause the failure modes.
We perform decoding experiments on document continuation and story generation, and demonstrate that Look-back is able to generate more fluent and coherent text.
arXiv Detail & Related papers (2023-05-22T20:42:37Z) - GN-Transformer: Fusing Sequence and Graph Representation for Improved
Code Summarization [0.0]
We propose a novel method, GN-Transformer, to learn end-to-end on a fused sequence and graph modality.
The proposed methods achieve state-of-the-art performance in two code summarization datasets and across three automatic code summarization metrics.
arXiv Detail & Related papers (2021-11-17T02:51:37Z) - Controllable Generation from Pre-trained Language Models via Inverse
Prompting [47.23315683944257]
We propose an innovative method, inverse prompting, to better control text generation.
Inverse prompting uses generated text to inversely predict the prompt during beam search.
Our results show that our proposed method substantially outperforms the baselines.
arXiv Detail & Related papers (2021-03-19T08:36:52Z) - Structural Information Preserving for Graph-to-Text Generation [59.00642847499138]
The task of graph-to-text generation aims at producing sentences that preserve the meaning of input graphs.
We propose to tackle this problem by leveraging richer training signals that can guide our model for preserving input information.
Experiments on two benchmarks for graph-to-text generation show the effectiveness of our approach over a state-of-the-art baseline.
arXiv Detail & Related papers (2021-02-12T20:09:01Z) - Promoting Graph Awareness in Linearized Graph-to-Text Generation [72.83863719868364]
We study the ability of linearized models to encode local graph structures.
Our findings motivate solutions to enrich the quality of models' implicit graph encodings.
We find that these denoising scaffolds lead to substantial improvements in downstream generation in low-resource settings.
arXiv Detail & Related papers (2020-12-31T18:17:57Z) - POINTER: Constrained Progressive Text Generation via Insertion-based
Generative Pre-training [93.79766670391618]
We present POINTER, a novel insertion-based approach for hard-constrained text generation.
The proposed method operates by progressively inserting new tokens between existing tokens in a parallel manner.
The resulting coarse-to-fine hierarchy makes the generation process intuitive and interpretable.
arXiv Detail & Related papers (2020-05-01T18:11:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.