Related papers: Ensembling Language Models with Sequential Monte Carlo

Ensembling Language Models with Sequential Monte Carlo

URL: http://arxiv.org/abs/2603.05432v1
Date: Thu, 05 Mar 2026 17:54:31 GMT
Title: Ensembling Language Models with Sequential Monte Carlo
Authors: Robin Shing Moon Chan, Tianyu Liu, Samuel Kiegeland, Clemente Pasti, Jacob Hoover Vigly, Timothy J. O'Donnell, Ryan Cotterell, Tim Vieira,
Abstract summary: We introduce a unified framework for composing $K$ language models into $f$-ensemble distributions.<n>We show that better posterior approximations can yield better ensemble performance.
Score: 48.149136054981334
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Practitioners have access to an abundance of language models and prompting strategies for solving many language modeling tasks; yet prior work shows that modeling performance is highly sensitive to both choices. Classical machine learning ensembling techniques offer a principled approach: aggregate predictions from multiple sources to achieve better performance than any single one. However, applying ensembling to language models during decoding is challenging: naively aggregating next-token probabilities yields samples from a locally normalized, biased approximation of the generally intractable ensemble distribution over strings. In this work, we introduce a unified framework for composing $K$ language models into $f$-ensemble distributions for a wide range of functions $f\colon\mathbb{R}_{\geq 0}^{K}\to\mathbb{R}_{\geq 0}$. To sample from these distributions, we propose a byte-level sequential Monte Carlo (SMC) algorithm that operates in a shared character space, enabling ensembles of models with mismatching vocabularies and consistent sampling in the limit. We evaluate a family of $f$-ensembles across prompt and model combinations for various structured text generation tasks, highlighting the benefits of alternative aggregation strategies over traditional probability averaging, and showing that better posterior approximations can yield better ensemble performance.

Related papers

Probabilistic Token Alignment for Large Language Model Fusion [100.30692772017238]
Training large language models (LLMs) from scratch can yield models with unique functionalities and strengths, but it is costly and often leads to redundant capabilities.<n>A key challenge in existing model fusion is their dependence on manually predefined vocabulary alignment.<n>We propose the probabilistic token alignment method as a general and soft mapping for alignment, named as PTA-LLM.
arXiv Detail & Related papers (2025-09-21T23:18:24Z)
Dynamic Collaboration of Multi-Language Models based on Minimal Complete Semantic Units [29.79935180749153]
This paper investigates the enhancement of reasoning capabilities in language models through token-level multi-model collaboration.<n>We introduce a distribution distance-based dynamic selection strategy (DDS) to optimize the multi-model collaboration process.
arXiv Detail & Related papers (2025-08-26T07:41:33Z)
Large Language Model-Based Automatic Formulation for Stochastic Optimization Models [0.0]
This paper presents the first integrated systematic study on the performance of large language models (LLMs)<n>We design several prompts that guide ChatGPT through structured tasks using chain-of- thought and modular reasoning.<n>Across a diverse set of problems, GPT-4-Turbo outperforms other models in partial score, variable matching, objective accuracy, with cot_s_instructions emerging as the most effective prompting strategies.
arXiv Detail & Related papers (2025-08-24T03:31:25Z)
Syntactic Control of Language Models by Posterior Inference [53.823006836309695]
Controlling the syntactic structure of text generated by language models is valuable for applications requiring clarity, stylistic consistency, or interpretability.<n>We argue that sampling algorithms based on the posterior inference can effectively enforce a target constituency structure during generation.<n>Our approach combines sequential Monte Carlo, which estimates the posterior distribution by sampling from a proposal distribution, with a syntactic tagger that ensures that each generated token aligns with the desired syntactic structure.
arXiv Detail & Related papers (2025-06-08T14:01:34Z)
Syntactic and Semantic Control of Large Language Models via Sequential Monte Carlo [90.78001821963008]
A wide range of LM applications require generating text that conforms to syntactic or semantic constraints.<n>We develop an architecture for controlled LM generation based on sequential Monte Carlo (SMC)<n>Our system builds on the framework of Lew et al. (2023) and integrates with its language model probabilistic programming language.
arXiv Detail & Related papers (2025-04-17T17:49:40Z)
Sample, Don't Search: Rethinking Test-Time Alignment for Language Models [55.2480439325792]
We introduce QAlign, a new test-time alignment approach.<n>As we scale test-time compute, QAlign converges to sampling from the optimal aligned distribution for each individual prompt.<n>By adopting recent advances in Markov chain Monte Carlo for text generation, our method enables better-aligned outputs without modifying the underlying model or even requiring logit access.
arXiv Detail & Related papers (2025-04-04T00:41:40Z)
Token-level Ensembling of Models with Different Vocabularies [16.094010998574753]
Model ensembling is a technique to combine the predicted distributions of two or more models.<n>This paper proposes an inference-time only algorithm that allows for ensembling models with different vocabularies.
arXiv Detail & Related papers (2025-02-28T17:41:27Z)
Determine-Then-Ensemble: Necessity of Top-k Union for Large Language Model Ensembling [23.447466392929712]
Large language models (LLMs) exhibit varying strengths and weaknesses across different tasks.<n>Existing LLM ensembling methods often overlook model compatibility and struggle with inefficient alignment of probabilities.<n>We introduce the textscUnion textscTop-$k$ textscEnsembling (textscUniTE), a novel approach that efficiently combines models by focusing on the union of the top-k tokens from each model.
arXiv Detail & Related papers (2024-10-03T08:42:38Z)
CharED: Character-wise Ensemble Decoding for Large Language Models [24.993790740335243]
We present an inference-time ensembling algorithm aimed at "averaging" outputs from multiple large language models. Our proposed model is able to combine complimentary strengths of multiple LLMs, regardless of vocabulary, tokenization, or model size.
arXiv Detail & Related papers (2024-06-25T22:35:07Z)
Bayesian Prompt Learning for Image-Language Model Generalization [64.50204877434878]
We use the regularization ability of Bayesian methods to frame prompt learning as a variational inference problem. Our approach regularizes the prompt space, reduces overfitting to the seen prompts and improves the prompt generalization on unseen prompts. We demonstrate empirically on 15 benchmarks that Bayesian prompt learning provides an appropriate coverage of the prompt space.
arXiv Detail & Related papers (2022-10-05T17:05:56Z)
Language Model Cascades [72.18809575261498]
Repeated interactions at test-time with a single model, or the composition of multiple models together, further expands capabilities. Cases with control flow and dynamic structure require techniques from probabilistic programming. We formalize several existing techniques from this perspective, including scratchpads / chain of thought, verifiers, STaR, selection-inference, and tool use.
arXiv Detail & Related papers (2022-07-21T07:35:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.