Related papers: Conditional Poisson Stochastic Beam Search

Conditional Poisson Stochastic Beam Search

URL: http://arxiv.org/abs/2109.11034v1
Date: Wed, 22 Sep 2021 20:49:16 GMT
Title: Conditional Poisson Stochastic Beam Search
Authors: Clara Meister, Afra Amini, Tim Viera, Ryan Cotterell
Abstract summary: Conditional Poisson beam search (CPSBS) is a more natural alternative to Kool et. al. 2019's beam search (SBS) CPSBS produces lower variance and more efficient estimators than SBS, even showing improvements in high entropy settings.
Score: 35.60062127942947
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Beam search is the default decoding strategy for many sequence generation tasks in NLP. The set of approximate K-best items returned by the algorithm is a useful summary of the distribution for many applications; however, the candidates typically exhibit high overlap and may give a highly biased estimate for expectations under our model. These problems can be addressed by instead using stochastic decoding strategies. In this work, we propose a new method for turning beam search into a stochastic process: Conditional Poisson stochastic beam search. Rather than taking the maximizing set at each iteration, we sample K candidates without replacement according to the conditional Poisson sampling design. We view this as a more natural alternative to Kool et. al. 2019's stochastic beam search (SBS). Furthermore, we show how samples generated under the CPSBS design can be used to build consistent estimators and sample diverse sets from sequence models. In our experiments, we observe CPSBS produces lower variance and more efficient estimators than SBS, even showing improvements in high entropy settings.

Related papers

Adaptive Sampled Softmax with Inverted Multi-Index: Methods, Theory and Applications [79.53938312089308]
The MIDX-Sampler is a novel adaptive sampling strategy based on an inverted multi-index approach. Our method is backed by rigorous theoretical analysis, addressing key concerns such as sampling bias, gradient bias, convergence rates, and generalization error bounds.
arXiv Detail & Related papers (2025-01-15T04:09:21Z)
Stable generative modeling using Schrödinger bridges [0.22499166814992438]
We propose a generative model combining Schr"odinger bridges and Langevin dynamics. Our framework can be naturally extended to generate conditional samples and to Bayesian inference problems.
arXiv Detail & Related papers (2024-01-09T06:15:45Z)
Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models [65.52639709094963]
Methods such as beam search and Gumbel top-k sampling can guarantee a different output for each element of the beam, but are not easy to parallelize. We present a framework for sampling according to an arithmetic code book implicitly defined by a large language model.
arXiv Detail & Related papers (2022-10-18T22:19:41Z)
Low-variance estimation in the Plackett-Luce model via quasi-Monte Carlo sampling [58.14878401145309]
We develop a novel approach to producing more sample-efficient estimators of expectations in the PL model. We illustrate our findings both theoretically and empirically using real-world recommendation data from Amazon Music and the Yahoo learning-to-rank challenge.
arXiv Detail & Related papers (2022-05-12T11:15:47Z)
Local policy search with Bayesian optimization [73.0364959221845]
Reinforcement learning aims to find an optimal policy by interaction with an environment. Policy gradients for local search are often obtained from random perturbations. We develop an algorithm utilizing a probabilistic model of the objective function and its gradient.
arXiv Detail & Related papers (2021-06-22T16:07:02Z)
On the Comparison between Cyclic Sampling and Random Reshuffling [27.27774034428748]
Cyclic sampling draws the samples in a fixed, cyclic order, which is less robust than reshuffling the samples periodically. Existing works have established worst case convergence rates for cyclic sampling, which are generally worse than that of random reshuffling. In this paper, we found a certain cyclic order can be much faster than reshuffling and one can discover it at a low cost.
arXiv Detail & Related papers (2021-04-25T09:29:43Z)
Oops I Took A Gradient: Scalable Sampling for Discrete Distributions [53.3142984019796]
We show that this approach outperforms generic samplers in a number of difficult settings. We also demonstrate the use of our improved sampler for training deep energy-based models on high dimensional discrete data.
arXiv Detail & Related papers (2021-02-08T20:08:50Z)
Stochastic Online Convex Optimization. Application to probabilistic time series forecasting [0.0]
We prove algorithms such as online newton steps and a scale-free 10 version of Bernstein online achieve best-known rates in unbounded settings. Our fast-rate regret bounds are any-time valid.
arXiv Detail & Related papers (2021-02-01T09:49:15Z)
Stochastic Saddle-Point Optimization for Wasserstein Barycenters [69.68068088508505]
We consider the populationimation barycenter problem for random probability measures supported on a finite set of points and generated by an online stream of data. We employ the structure of the problem and obtain a convex-concave saddle-point reformulation of this problem. In the setting when the distribution of random probability measures is discrete, we propose an optimization algorithm and estimate its complexity.
arXiv Detail & Related papers (2020-06-11T19:40:38Z)
Bayesian Quantile and Expectile Optimisation [3.3878745408530833]
We propose new variational models for Bayesian quantile and expectile regression that are well-suited for heteroscedastic noise settings. Our strategies can directly optimise for the quantile and expectile, without requiring replicating observations or assuming a parametric form for the noise. As illustrated in the experimental section, the proposed approach clearly outperforms the state of the art in the heteroscedastic, non-Gaussian case.
arXiv Detail & Related papers (2020-01-12T20:51:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.