A Block Metropolis-Hastings Sampler for Controllable Energy-based Text
Generation
- URL: http://arxiv.org/abs/2312.04510v1
- Date: Thu, 7 Dec 2023 18:30:15 GMT
- Title: A Block Metropolis-Hastings Sampler for Controllable Energy-based Text
Generation
- Authors: Jarad Forristal, Niloofar Mireshghallah, Greg Durrett, Taylor
Berg-Kirkpatrick
- Abstract summary: We develop a novel Metropolis-Hastings (MH) sampler that proposes re-writes of the entire sequence in each step via iterative prompting of a large language model.
Our new sampler allows for more efficient and accurate sampling from a target distribution and (b) allows generation length to be determined through the sampling procedure rather than fixed in advance.
- Score: 78.81021361497311
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent work has shown that energy-based language modeling is an effective
framework for controllable text generation because it enables flexible
integration of arbitrary discriminators. However, because energy-based LMs are
globally normalized, approximate techniques like Metropolis-Hastings (MH) are
required for inference. Past work has largely explored simple proposal
distributions that modify a single token at a time, like in Gibbs sampling. In
this paper, we develop a novel MH sampler that, in contrast, proposes re-writes
of the entire sequence in each step via iterative prompting of a large language
model. Our new sampler (a) allows for more efficient and accurate sampling from
a target distribution and (b) allows generation length to be determined through
the sampling procedure rather than fixed in advance, as past work has required.
We perform experiments on two controlled generation tasks, showing both
downstream performance gains and more accurate target distribution sampling in
comparison with single-token proposal techniques.
Related papers
- FR-Spec: Accelerating Large-Vocabulary Language Models via Frequency-Ranked Speculative Sampling [59.8051705468084]
Speculative sampling has emerged as an important technique for accelerating the auto-regressive generation process of large language models.
We present FR-Spec, a frequency-ranked speculative sampling framework that optimize draft candidate selection through vocabulary space compression.
arXiv Detail & Related papers (2025-02-20T18:58:10Z) - On the Query Complexity of Verifier-Assisted Language Generation [35.43462431990329]
We develop a framework for reasoning about constrained generation using a pre-trained language model generator oracle.
Access to a verifier can render an intractable problem (information-theoretically or computationally) to a tractable one.
We show even simple algorithms, like tokenwise rejection sampling, can enjoy significant benefits from access to a verifier.
arXiv Detail & Related papers (2025-02-17T18:46:32Z) - Principled Gradient-based Markov Chain Monte Carlo for Text Generation [77.46654898866291]
We propose several faithful gradient-based sampling algorithms to sample from the target energy-based text distribution correctly.
We demonstrate that faithful samplers are able to generate more fluent text while adhering to the control objectives better.
arXiv Detail & Related papers (2023-12-29T18:00:56Z) - Learning Sampling Distributions for Model Predictive Control [36.82905770866734]
Sampling-based approaches to Model Predictive Control (MPC) have become a cornerstone of contemporary approaches to MPC.
We propose to carry out all operations in the latent space, allowing us to take full advantage of the learned distribution.
Specifically, we frame the learning problem as bi-level optimization and show how to train the controller with backpropagation-through-time.
arXiv Detail & Related papers (2022-12-05T20:35:36Z) - Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models [65.52639709094963]
Methods such as beam search and Gumbel top-k sampling can guarantee a different output for each element of the beam, but are not easy to parallelize.
We present a framework for sampling according to an arithmetic code book implicitly defined by a large language model.
arXiv Detail & Related papers (2022-10-18T22:19:41Z) - Sampling from Discrete Energy-Based Models with Quality/Efficiency
Trade-offs [3.491202838583993]
Energy-Based Models (EBMs) allow for extremely flexible specifications of probability distributions.
They do not provide a mechanism for obtaining exact samples from these distributions.
We propose a new approximate sampling technique, Quasi Rejection Sampling (QRS), that allows for a trade-off between sampling efficiency and sampling quality.
arXiv Detail & Related papers (2021-12-10T17:51:37Z) - Reparameterized Sampling for Generative Adversarial Networks [71.30132908130581]
We propose REP-GAN, a novel sampling method that allows general dependent proposals by REizing the Markov chains into the latent space of the generator.
Empirically, extensive experiments on synthetic and real datasets demonstrate that our REP-GAN largely improves the sample efficiency and obtains better sample quality simultaneously.
arXiv Detail & Related papers (2021-07-01T10:34:55Z) - Exposing the Implicit Energy Networks behind Masked Language Models via
Metropolis--Hastings [57.133639209759615]
We interpret sequences as energy-based sequence models and propose two energy parametrizations derivable from traineds.
We develop a tractable emph scheme based on the Metropolis-Hastings Monte Carlo algorithm.
We validate the effectiveness of the proposed parametrizations by exploring the quality of samples drawn from these energy-based models.
arXiv Detail & Related papers (2021-06-04T22:04:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.