Conditional Poisson Stochastic Beam Search
- URL: http://arxiv.org/abs/2109.11034v1
- Date: Wed, 22 Sep 2021 20:49:16 GMT
- Title: Conditional Poisson Stochastic Beam Search
- Authors: Clara Meister, Afra Amini, Tim Viera, Ryan Cotterell
- Abstract summary: Conditional Poisson beam search (CPSBS) is a more natural alternative to Kool et. al. 2019's beam search (SBS)
CPSBS produces lower variance and more efficient estimators than SBS, even showing improvements in high entropy settings.
- Score: 35.60062127942947
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Beam search is the default decoding strategy for many sequence generation
tasks in NLP. The set of approximate K-best items returned by the algorithm is
a useful summary of the distribution for many applications; however, the
candidates typically exhibit high overlap and may give a highly biased estimate
for expectations under our model. These problems can be addressed by instead
using stochastic decoding strategies. In this work, we propose a new method for
turning beam search into a stochastic process: Conditional Poisson stochastic
beam search. Rather than taking the maximizing set at each iteration, we sample
K candidates without replacement according to the conditional Poisson sampling
design. We view this as a more natural alternative to Kool et. al. 2019's
stochastic beam search (SBS). Furthermore, we show how samples generated under
the CPSBS design can be used to build consistent estimators and sample diverse
sets from sequence models. In our experiments, we observe CPSBS produces lower
variance and more efficient estimators than SBS, even showing improvements in
high entropy settings.
Related papers
- Stable generative modeling using Schrödinger bridges [0.22499166814992438]
We propose a generative model combining Schr"odinger bridges and Langevin dynamics.
Our framework can be naturally extended to generate conditional samples and to Bayesian inference problems.
arXiv Detail & Related papers (2024-01-09T06:15:45Z) - Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models [65.52639709094963]
Methods such as beam search and Gumbel top-k sampling can guarantee a different output for each element of the beam, but are not easy to parallelize.
We present a framework for sampling according to an arithmetic code book implicitly defined by a large language model.
arXiv Detail & Related papers (2022-10-18T22:19:41Z) - Low-variance estimation in the Plackett-Luce model via quasi-Monte Carlo
sampling [58.14878401145309]
We develop a novel approach to producing more sample-efficient estimators of expectations in the PL model.
We illustrate our findings both theoretically and empirically using real-world recommendation data from Amazon Music and the Yahoo learning-to-rank challenge.
arXiv Detail & Related papers (2022-05-12T11:15:47Z) - Local policy search with Bayesian optimization [73.0364959221845]
Reinforcement learning aims to find an optimal policy by interaction with an environment.
Policy gradients for local search are often obtained from random perturbations.
We develop an algorithm utilizing a probabilistic model of the objective function and its gradient.
arXiv Detail & Related papers (2021-06-22T16:07:02Z) - On the Comparison between Cyclic Sampling and Random Reshuffling [27.27774034428748]
Cyclic sampling draws the samples in a fixed, cyclic order, which is less robust than reshuffling the samples periodically.
Existing works have established worst case convergence rates for cyclic sampling, which are generally worse than that of random reshuffling.
In this paper, we found a certain cyclic order can be much faster than reshuffling and one can discover it at a low cost.
arXiv Detail & Related papers (2021-04-25T09:29:43Z) - Oops I Took A Gradient: Scalable Sampling for Discrete Distributions [53.3142984019796]
We show that this approach outperforms generic samplers in a number of difficult settings.
We also demonstrate the use of our improved sampler for training deep energy-based models on high dimensional discrete data.
arXiv Detail & Related papers (2021-02-08T20:08:50Z) - Stochastic Online Convex Optimization. Application to probabilistic time
series forecasting [0.0]
We prove algorithms such as online newton steps and a scale-free 10 version of Bernstein online achieve best-known rates in unbounded settings.
Our fast-rate regret bounds are any-time valid.
arXiv Detail & Related papers (2021-02-01T09:49:15Z) - Stochastic Saddle-Point Optimization for Wasserstein Barycenters [69.68068088508505]
We consider the populationimation barycenter problem for random probability measures supported on a finite set of points and generated by an online stream of data.
We employ the structure of the problem and obtain a convex-concave saddle-point reformulation of this problem.
In the setting when the distribution of random probability measures is discrete, we propose an optimization algorithm and estimate its complexity.
arXiv Detail & Related papers (2020-06-11T19:40:38Z) - Bayesian Quantile and Expectile Optimisation [3.3878745408530833]
We propose new variational models for Bayesian quantile and expectile regression that are well-suited for heteroscedastic noise settings.
Our strategies can directly optimise for the quantile and expectile, without requiring replicating observations or assuming a parametric form for the noise.
As illustrated in the experimental section, the proposed approach clearly outperforms the state of the art in the heteroscedastic, non-Gaussian case.
arXiv Detail & Related papers (2020-01-12T20:51:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.