Learning to Plan for Language Modeling from Unlabeled Data
- URL: http://arxiv.org/abs/2404.00614v1
- Date: Sun, 31 Mar 2024 09:04:01 GMT
- Title: Learning to Plan for Language Modeling from Unlabeled Data
- Authors: Nathan Cornille, Marie-Francine Moens, Florian Mai,
- Abstract summary: We train a module for planning the future writing process via a self-supervised learning objective.
By conditioning on generated latent plans, our model extends the successful language model formula to more abstract planning in an unsupervised way.
- Score: 23.042650737356496
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: By training to predict the next token in an unlabeled corpus, large language models learn to perform many tasks without any labeled data. However, their next-token-prediction objective arguably limits their performance in scenarios that require planning, such as writing a coherent article. In this paper, we train a module for planning the future writing process via a self-supervised learning objective. By conditioning on generated latent plans, our model extends the successful language model formula to more abstract planning in an unsupervised way. Empirically, we demonstrate that our method improves language modeling performance in general, particularly with respect to the text structure. Because our framework uses a planner module that is unsupervised and external to the language model, new planner modules can be trained at large scale and easily be shared with the community.
Related papers
- PARADISE: Evaluating Implicit Planning Skills of Language Models with Procedural Warnings and Tips Dataset [0.0]
We present PARADISE, an abductive reasoning task using Q&A format on practical procedural text sourced from wikiHow.
It involves warning and tip inference tasks directly associated with goals, excluding intermediary steps, with the aim of testing the ability of the models to infer implicit knowledge of the plan solely from the given goal.
Our experiments, utilizing fine-tuned language models and zero-shot prompting, reveal the effectiveness of task-specific small models over large language models in most scenarios.
arXiv Detail & Related papers (2024-03-05T18:01:59Z) - Diffusion Language Models Can Perform Many Tasks with Scaling and
Instruction-Finetuning [56.03057119008865]
We show that scaling diffusion language models can effectively make them strong language learners.
We build competent diffusion language models at scale by first acquiring knowledge from massive data.
Experiments show that scaling diffusion language models consistently improves performance across downstream language tasks.
arXiv Detail & Related papers (2023-08-23T16:01:12Z) - PlaSma: Making Small Language Models Better Procedural Knowledge Models
for (Counterfactual) Planning [72.0564921186518]
PlaSma is a novel two-pronged approach to endow small language models with procedural knowledge and (counterfactual) planning capabilities.
More concretely, we develop symbolic procedural knowledge distillation to enhance the implicit knowledge in small language models.
In addition, we introduce a novel task, Counterfactual Planning, that requires a revision of a plan to cope with a counterfactual situation.
arXiv Detail & Related papers (2023-05-31T00:55:40Z) - Evidence of Meaning in Language Models Trained on Programs [5.892876463573452]
We present evidence that language models can learn meaning despite being trained only to perform next token prediction on text.
We first train a Transformer model on the corpus of programs, then probe the trained model's hidden states as it completes a program given a specification.
There is a strong, statistically significant correlation between the accuracy of the probe and the model's ability to generate a program that implements the specification.
arXiv Detail & Related papers (2023-05-18T17:58:08Z) - Pre-Training to Learn in Context [138.0745138788142]
The ability of in-context learning is not fully exploited because language models are not explicitly trained to learn in context.
We propose PICL (Pre-training for In-Context Learning), a framework to enhance the language models' in-context learning ability.
Our experiments show that PICL is more effective and task-generalizable than a range of baselines, outperforming larger language models with nearly 4x parameters.
arXiv Detail & Related papers (2023-05-16T03:38:06Z) - Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types.
Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z) - Few-shot Subgoal Planning with Language Models [58.11102061150875]
We show that language priors encoded in pre-trained language models allow us to infer fine-grained subgoal sequences.
In contrast to recent methods which make strong assumptions about subgoal supervision, our experiments show that language models can infer detailed subgoal sequences without any fine-tuning.
arXiv Detail & Related papers (2022-05-28T01:03:30Z) - Text-Based Action-Model Acquisition for Planning [13.110360825201044]
We propose a novel approach to learning action models from natural language texts by integrating Constraint Satisfaction and Natural Language Processing techniques.
Specifically, we first build a novel language model to extract plan traces from texts, and then build a set of constraints to generate action models based on the extracted plan traces.
arXiv Detail & Related papers (2022-02-15T02:23:31Z) - Language Models are not Models of Language [0.0]
Transfer learning has enabled large deep learning neural networks trained on the language modeling task to vastly improve performance.
We argue that the term language model is misleading because deep learning models are not theoretical models of language.
arXiv Detail & Related papers (2021-12-13T22:39:46Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.