Weak Signal Asymptotics for Sequentially Randomized Experiments
- URL: http://arxiv.org/abs/2101.09855v7
- Date: Thu, 22 Jun 2023 21:35:55 GMT
- Title: Weak Signal Asymptotics for Sequentially Randomized Experiments
- Authors: Xu Kuang and Stefan Wager
- Abstract summary: We study a class of sequentially randomized experiments, including those that arise in solving multi-armed bandit problems.
We show that the sample paths of a class of sequentially randomized experiments converge weakly to a diffusion limit.
We show that all sequential experiments whose randomization probabilities have a Lipschitz-continuous dependence on the observed data suffer from sub-optimal regret when reward gaps are relatively large.
- Score: 2.28438857884398
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We use the lens of weak signal asymptotics to study a class of sequentially
randomized experiments, including those that arise in solving multi-armed
bandit problems. In an experiment with $n$ time steps, we let the mean reward
gaps between actions scale to the order $1/\sqrt{n}$ so as to preserve the
difficulty of the learning task as $n$ grows. In this regime, we show that the
sample paths of a class of sequentially randomized experiments -- adapted to
this scaling regime and with arm selection probabilities that vary continuously
with state -- converge weakly to a diffusion limit, given as the solution to a
stochastic differential equation. The diffusion limit enables us to derive
refined, instance-specific characterization of stochastic dynamics, and to
obtain several insights on the regret and belief evolution of a number of
sequential experiments including Thompson sampling (but not UCB, which does not
satisfy our continuity assumption). We show that all sequential experiments
whose randomization probabilities have a Lipschitz-continuous dependence on the
observed data suffer from sub-optimal regret performance when the reward gaps
are relatively large. Conversely, we find that a version of Thompson sampling
with an asymptotically uninformative prior variance achieves near-optimal
instance-specific regret scaling, including with large reward gaps, but these
good regret properties come at the cost of highly unstable posterior beliefs.
Related papers
- The Polynomial Stein Discrepancy for Assessing Moment Convergence [1.0835264351334324]
We propose a novel method for measuring the discrepancy between a set of samples and a desired posterior distribution for Bayesian inference.
We show that the test has higher power than its competitors in several examples, and at a lower computational cost.
arXiv Detail & Related papers (2024-12-06T15:51:04Z) - REAL Sampling: Boosting Factuality and Diversity of Open-Ended Generation via Asymptotic Entropy [93.8400683020273]
Decoding methods for large language models (LLMs) usually struggle with the tradeoff between ensuring factuality and maintaining diversity.
We propose REAL sampling, a decoding method that improved factuality and diversity over nucleus sampling.
arXiv Detail & Related papers (2024-06-11T21:44:49Z) - Sampling in Constrained Domains with Orthogonal-Space Variational
Gradient Descent [13.724361914659438]
We propose a new variational framework with a designed orthogonal-space gradient flow (O-Gradient) for sampling on a manifold.
We prove that O-Gradient converges to the target constrained distribution with rate $widetildeO (1/textthe number of iterations)$ under mild conditions.
arXiv Detail & Related papers (2022-10-12T17:51:13Z) - Finite-Time Regret of Thompson Sampling Algorithms for Exponential
Family Multi-Armed Bandits [88.21288104408556]
We study the regret of Thompson sampling (TS) algorithms for exponential family bandits, where the reward distribution is from a one-dimensional exponential family.
We propose a Thompson sampling, termed Expulli, which uses a novel sampling distribution to avoid the under-estimation of the optimal arm.
arXiv Detail & Related papers (2022-06-07T18:08:21Z) - Algorithms for Adaptive Experiments that Trade-off Statistical Analysis
with Reward: Combining Uniform Random Assignment and Reward Maximization [50.725191156128645]
Multi-armed bandit algorithms like Thompson Sampling can be used to conduct adaptive experiments.
We present simulations for 2-arm experiments that explore two algorithms that combine the benefits of uniform randomization for statistical analysis.
arXiv Detail & Related papers (2021-12-15T22:11:58Z) - From Optimality to Robustness: Dirichlet Sampling Strategies in
Stochastic Bandits [0.0]
We study a generic Dirichlet Sampling (DS) algorithm, based on pairwise comparisons of empirical indices computed with re-sampling of the arms' observations.
We show that different variants of this strategy achieve provably optimal regret guarantees when the distributions are bounded and logarithmic regret for semi-bounded distributions with a mild quantile condition.
arXiv Detail & Related papers (2021-11-18T14:34:21Z) - Generalized Kernel Ridge Regression for Causal Inference with
Missing-at-Random Sample Selection [3.398662563413433]
I propose kernel ridge regression estimators for nonparametric dose response curves and semiparametric treatment effects.
For the discrete treatment case, I prove root-n consistency, Gaussian approximation, and semiparametric efficiency.
arXiv Detail & Related papers (2021-11-09T17:10:49Z) - Efficient Inference Without Trading-off Regret in Bandits: An Allocation
Probability Test for Thompson Sampling [1.6114012813668934]
Using bandit algorithms to conduct adaptive randomised experiments can minimise regret, but it poses major challenges for statistical inference.
Recent attempts to address these challenges typically impose restrictions on the exploitative nature of the bandit$-$trading off regret$-$and require large sample sizes to ensure guarantees.
We introduce a novel hypothesis test, uniquely based on the allocation probabilities of the bandit algorithm, and without constraining its exploitative nature or requiring a minimum experimental size.
We demonstrate the regret and inferential advantages of our approach, particularly in small samples, in both extensive simulations and in a real-world experiment on mental health aspects
arXiv Detail & Related papers (2021-10-30T01:47:14Z) - Heavy-tailed Streaming Statistical Estimation [58.70341336199497]
We consider the task of heavy-tailed statistical estimation given streaming $p$ samples.
We design a clipped gradient descent and provide an improved analysis under a more nuanced condition on the noise of gradients.
arXiv Detail & Related papers (2021-08-25T21:30:27Z) - A Unifying Theory of Thompson Sampling for Continuous Risk-Averse
Bandits [91.3755431537592]
This paper unifies the analysis of risk-averse Thompson sampling algorithms for the multi-armed bandit problem.
Using the contraction principle in the theory of large deviations, we prove novel concentration bounds for continuous risk functionals.
We show that a wide class of risk functionals as well as "nice" functions of them satisfy the continuity condition.
arXiv Detail & Related papers (2021-08-25T17:09:01Z) - Tracking disease outbreaks from sparse data with Bayesian inference [55.82986443159948]
The COVID-19 pandemic provides new motivation for estimating the empirical rate of transmission during an outbreak.
Standard methods struggle to accommodate the partial observability and sparse data common at finer scales.
We propose a Bayesian framework which accommodates partial observability in a principled manner.
arXiv Detail & Related papers (2020-09-12T20:37:33Z) - Double Trouble in Double Descent : Bias and Variance(s) in the Lazy
Regime [32.65347128465841]
Deep neural networks can achieve remarkable performances while interpolating the training data perfectly.
Rather than the U-curve of the bias-variance trade-off, their test error often follows a "double descent"
We develop a quantitative theory for this phenomenon in the so-called lazy learning regime of neural networks.
arXiv Detail & Related papers (2020-03-02T17:39:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.