Syntactic and Semantic Control of Large Language Models via Sequential Monte Carlo
- URL: http://arxiv.org/abs/2504.13139v2
- Date: Fri, 18 Apr 2025 18:45:25 GMT
- Title: Syntactic and Semantic Control of Large Language Models via Sequential Monte Carlo
- Authors: João Loula, Benjamin LeBrun, Li Du, Ben Lipkin, Clemente Pasti, Gabriel Grand, Tianyu Liu, Yahya Emara, Marjorie Freedman, Jason Eisner, Ryan Cotterell, Vikash Mansinghka, Alexander K. Lew, Tim Vieira, Timothy J. O'Donnell,
- Abstract summary: A wide range of LM applications require generating text that conforms to syntactic or semantic constraints.<n>We develop an architecture for controlled LM generation based on sequential Monte Carlo (SMC)<n>Our system builds on the framework of Lew et al. (2023) and integrates with its language model probabilistic programming language.
- Score: 90.78001821963008
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A wide range of LM applications require generating text that conforms to syntactic or semantic constraints. Imposing such constraints can be naturally framed as probabilistic conditioning, but exact generation from the resulting distribution -- which can differ substantially from the LM's base distribution -- is generally intractable. In this work, we develop an architecture for controlled LM generation based on sequential Monte Carlo (SMC). Our SMC framework allows us to flexibly incorporate domain- and problem-specific constraints at inference time, and efficiently reallocate computational resources in light of new information during the course of generation. By comparing to a number of alternatives and ablations on four challenging domains -- Python code generation for data science, text-to-SQL, goal inference, and molecule synthesis -- we demonstrate that, with little overhead, our approach allows small open-source language models to outperform models over 8x larger, as well as closed-source, fine-tuned ones. In support of the probabilistic perspective, we show that these performance improvements are driven by better approximation to the posterior distribution. Our system builds on the framework of Lew et al. (2023) and integrates with its language model probabilistic programming language, giving users a simple, programmable way to apply SMC to a broad variety of controlled generation problems.
Related papers
- Distributed LLMs and Multimodal Large Language Models: A Survey on Advances, Challenges, and Future Directions [1.3638337521666275]
Language models (LMs) are machine learning models designed to predict linguistic patterns by estimating the probability of word sequences based on large-scale datasets, such as text.<n>Although larger datasets typically enhance LM performance, scalability remains a challenge due to constraints in computational power and resources.<n>Recent research has focused on developing decentralized techniques to enable distributed training and inference.
arXiv Detail & Related papers (2025-03-20T15:18:25Z) - Unlocking the Potential of Model Merging for Low-Resource Languages [66.7716891808697]
Adapting large language models to new languages typically involves continual pre-training (CT) followed by supervised fine-tuning (SFT)
We propose model merging as an alternative for low-resource languages, combining models with distinct capabilities into a single model without additional training.
Experiments based on Llama-2-7B demonstrate that model merging effectively endows LLMs for low-resource languages with task-solving abilities, outperforming CT-then-SFT in scenarios with extremely scarce data.
arXiv Detail & Related papers (2024-07-04T15:14:17Z) - CharED: Character-wise Ensemble Decoding for Large Language Models [24.993790740335243]
We present an inference-time ensembling algorithm aimed at "averaging" outputs from multiple large language models.
Our proposed model is able to combine complimentary strengths of multiple LLMs, regardless of vocabulary, tokenization, or model size.
arXiv Detail & Related papers (2024-06-25T22:35:07Z) - Controlled Text Generation via Language Model Arithmetic [7.687678490751105]
We introduce model arithmetic, a novel inference framework for composing and biasing Large Language Models.
We show that model arithmetic allows fine-grained control of generated text while outperforming state-of-the-art on the task of toxicity reduction.
arXiv Detail & Related papers (2023-11-24T13:41:12Z) - Amortizing intractable inference in large language models [56.92471123778389]
We use amortized Bayesian inference to sample from intractable posterior distributions.
We empirically demonstrate that this distribution-matching paradigm of LLM fine-tuning can serve as an effective alternative to maximum-likelihood training.
As an important application, we interpret chain-of-thought reasoning as a latent variable modeling problem.
arXiv Detail & Related papers (2023-10-06T16:36:08Z) - Simultaneous Machine Translation with Large Language Models [51.470478122113356]
We investigate the possibility of applying Large Language Models to SimulMT tasks.
We conducted experiments using the textttLlama2-7b-chat model on nine different languages from the MUST-C dataset.
The results show that LLM outperforms dedicated MT models in terms of BLEU and LAAL metrics.
arXiv Detail & Related papers (2023-09-13T04:06:47Z) - Large Language Models as General Pattern Machines [64.75501424160748]
We show that pre-trained large language models (LLMs) are capable of autoregressively completing complex token sequences.
Surprisingly, pattern completion proficiency can be partially retained even when the sequences are expressed using tokens randomly sampled from the vocabulary.
In this work, we investigate how these zero-shot capabilities may be applied to problems in robotics.
arXiv Detail & Related papers (2023-07-10T17:32:13Z) - Sequential Monte Carlo Steering of Large Language Models using
Probabilistic Programs [46.721838623748816]
We propose a new inference-time approach to enforcing syntactic and semantic constraints on the outputs of large language models.
Key idea is to specify language generation tasks as posterior inference problems in a class of discrete probabilistic sequence models.
For a computational cost similar to that of beam search, SMC can steer LLMs to solve diverse tasks.
arXiv Detail & Related papers (2023-06-05T17:55:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.