Sequential Monte Carlo Steering of Large Language Models using
Probabilistic Programs
- URL: http://arxiv.org/abs/2306.03081v2
- Date: Sun, 26 Nov 2023 21:40:00 GMT
- Title: Sequential Monte Carlo Steering of Large Language Models using
Probabilistic Programs
- Authors: Alexander K. Lew, Tan Zhi-Xuan, Gabriel Grand, and Vikash K.
Mansinghka
- Abstract summary: We propose a new inference-time approach to enforcing syntactic and semantic constraints on the outputs of large language models.
Key idea is to specify language generation tasks as posterior inference problems in a class of discrete probabilistic sequence models.
For a computational cost similar to that of beam search, SMC can steer LLMs to solve diverse tasks.
- Score: 46.721838623748816
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Even after fine-tuning and reinforcement learning, large language models
(LLMs) can be difficult, if not impossible, to control reliably with prompts
alone. We propose a new inference-time approach to enforcing syntactic and
semantic constraints on the outputs of LLMs, called sequential Monte Carlo
(SMC) steering. The key idea is to specify language generation tasks as
posterior inference problems in a class of discrete probabilistic sequence
models, and replace standard decoding with sequential Monte Carlo inference.
For a computational cost similar to that of beam search, SMC can steer LLMs to
solve diverse tasks, including infilling, generation under syntactic
constraints, and prompt intersection. To facilitate experimentation with SMC
steering, we present a probabilistic programming library, LLaMPPL
(https://github.com/probcomp/hfppl), for concisely specifying new generation
tasks as language model probabilistic programs, and automating steering of
LLaMA-family Transformers.
Related papers
- Probabilistic Inference in Language Models via Twisted Sequential Monte Carlo [7.182174507225034]
We leverage the rich toolkit of Sequential Monte Carlo (SMC) for probabilistic inference problems.
We use learned twist functions to estimate the expected future value of the potential at each timestep.
We present methods for evaluating the accuracy of language model inference techniques.
arXiv Detail & Related papers (2024-04-26T17:18:32Z) - VerMCTS: Synthesizing Multi-Step Programs using a Verifier, a Large Language Model, and Tree Search [5.389248707675898]
Large Language Models (LLMs) can generate useful code, but often the code they generate cannot be trusted to be sound.
We present VerMCTS, an approach to begin to resolve this issue by generating verified programs in Dafny and Coq.
arXiv Detail & Related papers (2024-02-13T00:55:14Z) - CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules [51.82044734879657]
We propose CodeChain, a novel framework for inference that elicits modularized code generation through a chain of self-revisions.
We find that CodeChain can significantly boost both modularity as well as correctness of the generated solutions, achieving relative pass@1 improvements of 35% on APPS and 76% on CodeContests.
arXiv Detail & Related papers (2023-10-13T10:17:48Z) - Simultaneous Machine Translation with Large Language Models [51.470478122113356]
We investigate the possibility of applying Large Language Models to SimulMT tasks.
We conducted experiments using the textttLlama2-7b-chat model on nine different languages from the MUST-C dataset.
The results show that LLM outperforms dedicated MT models in terms of BLEU and LAAL metrics.
arXiv Detail & Related papers (2023-09-13T04:06:47Z) - Instruction Position Matters in Sequence Generation with Large Language
Models [67.87516654892343]
Large language models (LLMs) are capable of performing conditional sequence generation tasks, such as translation or summarization.
We propose enhancing the instruction-following capability of LLMs by shifting the position of task instructions after the input sentences.
arXiv Detail & Related papers (2023-08-23T12:36:57Z) - Large Language Models as General Pattern Machines [64.75501424160748]
We show that pre-trained large language models (LLMs) are capable of autoregressively completing complex token sequences.
Surprisingly, pattern completion proficiency can be partially retained even when the sequences are expressed using tokens randomly sampled from the vocabulary.
In this work, we investigate how these zero-shot capabilities may be applied to problems in robotics.
arXiv Detail & Related papers (2023-07-10T17:32:13Z) - SatLM: Satisfiability-Aided Language Models Using Declarative Prompting [68.40726892904286]
We propose a new satisfiability-aided language modeling (SatLM) approach for improving the reasoning capabilities of large language models (LLMs)
We use an LLM to generate a declarative task specification rather than an imperative program and leverage an off-the-shelf automated theorem prover to derive the final answer.
We evaluate SATLM on 8 different datasets and show that it consistently outperforms program-aided LMs in the imperative paradigm.
arXiv Detail & Related papers (2023-05-16T17:55:51Z) - Language Generation via Combinatorial Constraint Satisfaction: A Tree
Search Enhanced Monte-Carlo Approach [24.897552102098324]
We present a framework to allow specification of constraints for sentence generation.
We propose TSMH, an efficient method to generate high likelihood sentences with respect to a pre-trained language model.
Our approach is highly flexible, requires no task-specific training, and leverages efficient constraint satisfaction solving techniques.
arXiv Detail & Related papers (2020-11-24T19:21:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.