Efficient and Asymptotically Unbiased Constrained Decoding for Large Language Models
- URL: http://arxiv.org/abs/2504.09135v1
- Date: Sat, 12 Apr 2025 08:49:21 GMT
- Title: Efficient and Asymptotically Unbiased Constrained Decoding for Large Language Models
- Authors: Haotian Ye, Himanshu Jain, Chong You, Ananda Theertha Suresh, Haowei Lin, James Zou, Felix Yu,
- Abstract summary: This paper introduces Dynamic Importance Sampling for Constrained Parallel Prefix-Verification (PPV)<n>PPV is a novel algorithm that leverages dynamic importance sampling to achieve theoretically guaranteed unbiasedness and overcomes the inefficiency of prefix-tree.<n>Experiments demonstrate the superiority of our method over existing methods in both efficiency and output quality.
- Score: 46.43567715840425
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In real-world applications of large language models, outputs are often required to be confined: selecting items from predefined product or document sets, generating phrases that comply with safety standards, or conforming to specialized formatting styles. To control the generation, constrained decoding has been widely adopted. However, existing prefix-tree-based constrained decoding is inefficient under GPU-based model inference paradigms, and it introduces unintended biases into the output distribution. This paper introduces Dynamic Importance Sampling for Constrained Decoding (DISC) with GPU-based Parallel Prefix-Verification (PPV), a novel algorithm that leverages dynamic importance sampling to achieve theoretically guaranteed asymptotic unbiasedness and overcomes the inefficiency of prefix-tree. Extensive experiments demonstrate the superiority of our method over existing methods in both efficiency and output quality. These results highlight the potential of our methods to improve constrained generation in applications where adherence to specific constraints is essential.
Related papers
- Constrained Auto-Regressive Decoding Constrains Generative Retrieval [71.71161220261655]
Generative retrieval seeks to replace traditional search index data structures with a single large-scale neural network.<n>In this paper, we examine the inherent limitations of constrained auto-regressive generation from two essential perspectives: constraints and beam search.
arXiv Detail & Related papers (2025-04-14T06:54:49Z) - Local Normalization Distortion and the Thermodynamic Formalism of Decoding Strategies for Large Language Models [0.0]
We develop the theory of decoding strategies for language models by expressing popular decoding algorithms as equilibrium states in the language of ergodic theory.
We analyze the effect of the local normalization step of top-k, nucleus, and temperature sampling, used to make probabilities sum to one.
Contrary to the prevailing explanation, we argue that the major cause of the under-performance of top-k sampling relative to nucleus sampling is local normalization distortion.
arXiv Detail & Related papers (2025-03-27T19:15:43Z) - Constrained Language Generation with Discrete Diffusion Models [61.81569616239755]
We present Constrained Discrete Diffusion (CDD), a novel method for enforcing constraints on natural language by integrating discrete diffusion models with differentiable optimization.
We show how this technique can be applied to satisfy a variety of natural language constraints, including (i) toxicity mitigation by preventing harmful content from emerging, (ii) character and sequence level lexical constraints, and (iii) novel molecule sequence generation with specific property adherence.
arXiv Detail & Related papers (2025-03-12T19:48:12Z) - Controlled LLM Decoding via Discrete Auto-regressive Biasing [9.843359827321194]
Controlled text generation allows for enforcing user-defined constraints on large language model outputs.
We propose Discrete Auto-regressive Biasing, a controlled decoding algorithm that leverages gradients while operating entirely in the discrete text domain.
Our method significantly improves constraint satisfaction while maintaining comparable or better fluency, all with even lower computational costs.
arXiv Detail & Related papers (2025-02-06T00:14:43Z) - Speculative Diffusion Decoding: Accelerating Language Generation through Diffusion [55.0194604505437]
Speculative decoding has emerged as a widely adopted method to accelerate large language model inference.<n>This paper proposes an adaptation of speculative decoding which uses discrete diffusion models to generate draft sequences.
arXiv Detail & Related papers (2024-08-10T21:24:25Z) - Constrained Synthesis with Projected Diffusion Models [47.56192362295252]
This paper introduces an approach to generative diffusion processes the ability to satisfy and certify compliance with constraints and physical principles.
The proposed method recast the traditional process of generative diffusion as a constrained distribution problem to ensure adherence to constraints.
arXiv Detail & Related papers (2024-02-05T22:18:16Z) - A Sparsity-promoting Dictionary Model for Variational Autoencoders [16.61511959679188]
Structuring the latent space in deep generative models is important to yield more expressive models and interpretable representations.
We propose a simple yet effective methodology to structure the latent space via a sparsity-promoting dictionary model.
arXiv Detail & Related papers (2022-03-29T17:13:11Z) - COLD Decoding: Energy-based Constrained Text Generation with Langevin
Dynamics [69.8062252611486]
Cold decoding is a flexible framework that can be applied directly to off-the-shelf left-to-right language models.
Our experiments on constrained generation tasks point to the effectiveness of our approach, both in terms of automatic and human evaluation.
arXiv Detail & Related papers (2022-02-23T18:59:27Z) - Direction is what you need: Improving Word Embedding Compression in
Large Language Models [7.736463504706344]
This paper presents a novel loss objective to compress token embeddings in Transformer-based models by leveraging an AutoEncoder architecture.
Our method significantly outperforms the commonly used SVD-based matrix-factorization approach in terms of initial language model Perplexity.
arXiv Detail & Related papers (2021-06-15T14:28:00Z) - Improve Variational Autoencoder for Text Generationwith Discrete Latent
Bottleneck [52.08901549360262]
Variational autoencoders (VAEs) are essential tools in end-to-end representation learning.
VAEs tend to ignore latent variables with a strong auto-regressive decoder.
We propose a principled approach to enforce an implicit latent feature matching in a more compact latent space.
arXiv Detail & Related papers (2020-04-22T14:41:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.