Related papers: Learning to Plan Long-Term for Language Modeling

Learning to Plan Long-Term for Language Modeling

URL: http://arxiv.org/abs/2409.00070v1
Date: Fri, 23 Aug 2024 21:18:10 GMT
Title: Learning to Plan Long-Term for Language Modeling
Authors: Florian Mai, Nathan Cornille, Marie-Francine Moens,
Abstract summary: We propose a planner that predicts a latent plan for many sentences into the future. By sampling multiple plans at once, we condition the language model on an accurate approximation of the distribution of text continuations.
Score: 23.042650737356496
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Modern language models predict the next token in the sequence by considering the past text through a powerful function such as attention. However, language models have no explicit mechanism that allows them to spend computation time for planning long-distance future text, leading to a suboptimal token prediction. In this paper, we propose a planner that predicts a latent plan for many sentences into the future. By sampling multiple plans at once, we condition the language model on an accurate approximation of the distribution of text continuations, which leads to better next token prediction accuracy. In effect, this allows trading computation time for prediction accuracy.

Related papers

Semformer: Transformer Language Models with Semantic Planning [18.750863564495006]
Next-token prediction serves as the dominant component in current neural language models. We introduce Semformer, a novel method of training a Transformer language model that explicitly models the semantic planning of response.
arXiv Detail & Related papers (2024-09-17T12:54:34Z)
Few-shot Subgoal Planning with Language Models [58.11102061150875]
We show that language priors encoded in pre-trained language models allow us to infer fine-grained subgoal sequences. In contrast to recent methods which make strong assumptions about subgoal supervision, our experiments show that language models can infer detailed subgoal sequences without any fine-tuning.
arXiv Detail & Related papers (2022-05-28T01:03:30Z)
Uncertainty estimation of pedestrian future trajectory using Bayesian approximation [137.00426219455116]
Under dynamic traffic scenarios, planning based on deterministic predictions is not trustworthy. The authors propose to quantify uncertainty during forecasting using approximation which deterministic approaches fail to capture. The effect of dropout weights and long-term prediction on future state uncertainty has been studied.
arXiv Detail & Related papers (2022-05-04T04:23:38Z)
Trajectory Prediction with Linguistic Representations [27.71805777845141]
We present a novel trajectory prediction model that uses linguistic intermediate representations to forecast trajectories. The model learns the meaning of each of the words without direct per-word supervision. It generates a linguistic description of trajectories which captures maneuvers and interactions over an extended time interval.
arXiv Detail & Related papers (2021-10-19T05:22:38Z)
Aligned Contrastive Predictive Coding [10.521845940927163]
We investigate the possibility of forcing a self-supervised model trained using a contrastive predictive loss to extract slowly varying latent representations. Rather than producing individual predictions for each of the future representations, the model emits a sequence of predictions shorter than that of the upcoming representations to which they will be aligned.
arXiv Detail & Related papers (2021-04-24T13:07:22Z)
Revisiting Hierarchical Approach for Persistent Long-Term Video Prediction [55.4498466252522]
We set a new standard of video prediction with orders of magnitude longer prediction time than existing approaches. Our method predicts future frames by first estimating a sequence of semantic structures and subsequently translating the structures to pixels by video-to-video translation. We evaluate our method on three challenging datasets involving car driving and human dancing, and demonstrate that it can generate complicated scene structures and motions over a very long time horizon.
arXiv Detail & Related papers (2021-04-14T08:39:38Z)
Adversarial Generative Grammars for Human Activity Prediction [141.43526239537502]
We propose an adversarial generative grammar model for future prediction. Our grammar is designed so that it can learn production rules from the data distribution. Being able to select multiple production rules during inference leads to different predicted outcomes.
arXiv Detail & Related papers (2020-08-11T17:47:53Z)
Measuring Forecasting Skill from Text [15.795144936579627]
We explore connections between the language people use to describe their predictions and their forecasting skill. We present a number of linguistic metrics which are computed over text associated with people's predictions about the future. We demonstrate that it is possible to accurately predict forecasting skill using a model that is based solely on language.
arXiv Detail & Related papers (2020-06-12T19:04:10Z)
Toward Better Storylines with Sentence-Level Language Models [54.91921545103256]
We propose a sentence-level language model which selects the next sentence in a story from a finite set of fluent alternatives. We demonstrate the effectiveness of our approach with state-of-the-art accuracy on the unsupervised Story Cloze task.
arXiv Detail & Related papers (2020-05-11T16:54:19Z)
ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training [85.35910219651572]
We present a new sequence-to-sequence pre-training model called ProphetNet. It introduces a novel self-supervised objective named future n-gram prediction. We conduct experiments on CNN/DailyMail, Gigaword, and SQuAD 1.1 benchmarks for abstractive summarization and question generation tasks.
arXiv Detail & Related papers (2020-01-13T05:12:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.