Related papers: Sequential Monte Carlo Steering of Large Language Models using Probabilistic Programs

Sequential Monte Carlo Steering of Large Language Models using Probabilistic Programs

URL: http://arxiv.org/abs/2306.03081v2
Date: Sun, 26 Nov 2023 21:40:00 GMT
Title: Sequential Monte Carlo Steering of Large Language Models using Probabilistic Programs
Authors: Alexander K. Lew, Tan Zhi-Xuan, Gabriel Grand, and Vikash K. Mansinghka
Abstract summary: We propose a new inference-time approach to enforcing syntactic and semantic constraints on the outputs of large language models. Key idea is to specify language generation tasks as posterior inference problems in a class of discrete probabilistic sequence models. For a computational cost similar to that of beam search, SMC can steer LLMs to solve diverse tasks.
Score: 46.721838623748816
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Even after fine-tuning and reinforcement learning, large language models (LLMs) can be difficult, if not impossible, to control reliably with prompts alone. We propose a new inference-time approach to enforcing syntactic and semantic constraints on the outputs of LLMs, called sequential Monte Carlo (SMC) steering. The key idea is to specify language generation tasks as posterior inference problems in a class of discrete probabilistic sequence models, and replace standard decoding with sequential Monte Carlo inference. For a computational cost similar to that of beam search, SMC can steer LLMs to solve diverse tasks, including infilling, generation under syntactic constraints, and prompt intersection. To facilitate experimentation with SMC steering, we present a probabilistic programming library, LLaMPPL (https://github.com/probcomp/hfppl), for concisely specifying new generation tasks as language model probabilistic programs, and automating steering of LLaMA-family Transformers.

Related papers

Syntactic and Semantic Control of Large Language Models via Sequential Monte Carlo [90.78001821963008]
A wide range of LM applications require generating text that conforms to syntactic or semantic constraints. We develop an architecture for controlled LM generation based on sequential Monte Carlo (SMC) Our system builds on the framework of Lew et al. (2023) and integrates with its language model probabilistic programming language.
arXiv Detail & Related papers (2025-04-17T17:49:40Z)
Self-Steering Language Models [113.96916935955842]
DisCIPL is a method for "self-steering" language models. DisCIPL uses a Planner model to generate a task-specific inference program. Our work opens up a design space of highly-parallelized Monte Carlo inference strategies.
arXiv Detail & Related papers (2025-04-09T17:54:22Z)
Probabilistic Inference in Language Models via Twisted Sequential Monte Carlo [7.182174507225034]
We leverage the rich toolkit of Sequential Monte Carlo (SMC) for probabilistic inference problems. We use learned twist functions to estimate the expected future value of the potential at each timestep. We present methods for evaluating the accuracy of language model inference techniques.
arXiv Detail & Related papers (2024-04-26T17:18:32Z)
VerMCTS: Synthesizing Multi-Step Programs using a Verifier, a Large Language Model, and Tree Search [5.389248707675898]
Large Language Models (LLMs) can generate useful code, but often the code they generate cannot be trusted to be sound. We present VerMCTS, an approach to begin to resolve this issue by generating verified programs in Dafny and Coq.
arXiv Detail & Related papers (2024-02-13T00:55:14Z)
CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules [51.82044734879657]
We propose CodeChain, a novel framework for inference that elicits modularized code generation through a chain of self-revisions. We find that CodeChain can significantly boost both modularity as well as correctness of the generated solutions, achieving relative pass@1 improvements of 35% on APPS and 76% on CodeContests.
arXiv Detail & Related papers (2023-10-13T10:17:48Z)
Simultaneous Machine Translation with Large Language Models [51.470478122113356]
We investigate the possibility of applying Large Language Models to SimulMT tasks. We conducted experiments using the textttLlama2-7b-chat model on nine different languages from the MUST-C dataset. The results show that LLM outperforms dedicated MT models in terms of BLEU and LAAL metrics.
arXiv Detail & Related papers (2023-09-13T04:06:47Z)
Instruction Position Matters in Sequence Generation with Large Language Models [67.87516654892343]
Large language models (LLMs) are capable of performing conditional sequence generation tasks, such as translation or summarization. We propose enhancing the instruction-following capability of LLMs by shifting the position of task instructions after the input sentences.
arXiv Detail & Related papers (2023-08-23T12:36:57Z)
Large Language Models as General Pattern Machines [64.75501424160748]
We show that pre-trained large language models (LLMs) are capable of autoregressively completing complex token sequences. Surprisingly, pattern completion proficiency can be partially retained even when the sequences are expressed using tokens randomly sampled from the vocabulary. In this work, we investigate how these zero-shot capabilities may be applied to problems in robotics.
arXiv Detail & Related papers (2023-07-10T17:32:13Z)
SatLM: Satisfiability-Aided Language Models Using Declarative Prompting [68.40726892904286]
We propose a new satisfiability-aided language modeling (SatLM) approach for improving the reasoning capabilities of large language models (LLMs) We use an LLM to generate a declarative task specification rather than an imperative program and leverage an off-the-shelf automated theorem prover to derive the final answer. We evaluate SATLM on 8 different datasets and show that it consistently outperforms program-aided LMs in the imperative paradigm.
arXiv Detail & Related papers (2023-05-16T17:55:51Z)
Language Generation via Combinatorial Constraint Satisfaction: A Tree Search Enhanced Monte-Carlo Approach [24.897552102098324]
We present a framework to allow specification of constraints for sentence generation. We propose TSMH, an efficient method to generate high likelihood sentences with respect to a pre-trained language model. Our approach is highly flexible, requires no task-specific training, and leverages efficient constraint satisfaction solving techniques.
arXiv Detail & Related papers (2020-11-24T19:21:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.