Computing the Performance of A New Adaptive Sampling Algorithm Based on
The Gittins Index in Experiments with Exponential Rewards
- URL: http://arxiv.org/abs/2301.01107v1
- Date: Tue, 3 Jan 2023 14:04:13 GMT
- Title: Computing the Performance of A New Adaptive Sampling Algorithm Based on
The Gittins Index in Experiments with Exponential Rewards
- Authors: James K. He, Sof\'ia S. Villar, and Lida Mavrogonatou
- Abstract summary: The Gittins Index (GI) is a solution to the Multi-Armed Bandit Problem that can simultaneously attain optimality and computationally efficiency goals.
We present a modification of the GI rule that can be used in experiments with exponentially-distributed rewards.
Compared to traditional non-adaptive designs, our novel GI modified design shows operating characteristics comparable in learning but substantially better in earning.
- Score: 1.1470070927586016
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Designing experiments often requires balancing between learning about the
true treatment effects and earning from allocating more samples to the superior
treatment. While optimal algorithms for the Multi-Armed Bandit Problem (MABP)
provide allocation policies that optimally balance learning and earning, they
tend to be computationally expensive. The Gittins Index (GI) is a solution to
the MABP that can simultaneously attain optimality and computationally
efficiency goals, and it has been recently used in experiments with Bernoulli
and Gaussian rewards. For the first time, we present a modification of the GI
rule that can be used in experiments with exponentially-distributed rewards. We
report its performance in simulated 2- armed and 3-armed experiments. Compared
to traditional non-adaptive designs, our novel GI modified design shows
operating characteristics comparable in learning (e.g. statistical power) but
substantially better in earning (e.g. direct benefits). This illustrates the
potential that designs using a GI approach to allocate participants have to
improve participant benefits, increase efficiencies, and reduce experimental
costs in adaptive multi-armed experiments with exponential rewards.
Related papers
- Optimizing Adaptive Experiments: A Unified Approach to Regret
Minimization and Best-Arm Identification [10.66863856524397]
This paper proposes a unified model that accounts for both within-experiment performance and post-experiment outcomes.
We then provide a theory of optimal performance in large populations that unifies canonical results in the literature.
arXiv Detail & Related papers (2024-02-16T11:27:48Z) - Adaptive Instrument Design for Indirect Experiments [48.815194906471405]
Unlike RCTs, indirect experiments estimate treatment effects by leveragingconditional instrumental variables.
In this paper we take the initial steps towards enhancing sample efficiency for indirect experiments by adaptively designing a data collection policy.
Our main contribution is a practical computational procedure that utilizes influence functions to search for an optimal data collection policy.
arXiv Detail & Related papers (2023-12-05T02:38:04Z) - Learning Better with Less: Effective Augmentation for Sample-Efficient
Visual Reinforcement Learning [57.83232242068982]
Data augmentation (DA) is a crucial technique for enhancing the sample efficiency of visual reinforcement learning (RL) algorithms.
It remains unclear which attributes of DA account for its effectiveness in achieving sample-efficient visual RL.
This work conducts comprehensive experiments to assess the impact of DA's attributes on its efficacy.
arXiv Detail & Related papers (2023-05-25T15:46:20Z) - UGAE: A Novel Approach to Non-exponential Discounting [9.358303424584902]
Non-exponential discounting methods that align with human behavior are often desirable for creating human-like agents.
We propose Universal Generalized Advantage Estimation (UGAE) which allows for the computation of GAE advantage values with arbitrary discounting.
We show experimentally that agents with non-exponential discounting trained via UGAE outperform variants trained with Monte Carlo advantage estimation.
arXiv Detail & Related papers (2023-02-11T16:41:05Z) - Advancing Model Pruning via Bi-level Optimization [89.88761425199598]
iterative magnitude pruning (IMP) is the predominant pruning method to successfully find 'winning tickets'
One-shot pruning methods have been developed, but these schemes are usually unable to find winning tickets as good as IMP.
We show that the proposed bi-level optimization-oriented pruning method (termed BiP) is a special class of BLO problems with a bi-linear problem structure.
arXiv Detail & Related papers (2022-10-08T19:19:29Z) - Design Amortization for Bayesian Optimal Experimental Design [70.13948372218849]
We build off of successful variational approaches, which optimize a parameterized variational model with respect to bounds on the expected information gain (EIG)
We present a novel neural architecture that allows experimenters to optimize a single variational model that can estimate the EIG for potentially infinitely many designs.
arXiv Detail & Related papers (2022-10-07T02:12:34Z) - Sample-Efficient Optimisation with Probabilistic Transformer Surrogates [66.98962321504085]
This paper investigates the feasibility of employing state-of-the-art probabilistic transformers in Bayesian optimisation.
We observe two drawbacks stemming from their training procedure and loss definition, hindering their direct deployment as proxies in black-box optimisation.
We introduce two components: 1) a BO-tailored training prior supporting non-uniformly distributed points, and 2) a novel approximate posterior regulariser trading-off accuracy and input sensitivity to filter favourable stationary points for improved predictive performance.
arXiv Detail & Related papers (2022-05-27T11:13:17Z) - Test-time Batch Normalization [61.292862024903584]
Deep neural networks often suffer the data distribution shift between training and testing.
We revisit the batch normalization (BN) in the training process and reveal two key insights benefiting test-time optimization.
We propose a novel test-time BN layer design, GpreBN, which is optimized during testing by minimizing Entropy loss.
arXiv Detail & Related papers (2022-05-20T14:33:39Z) - Challenges in Statistical Analysis of Data Collected by a Bandit
Algorithm: An Empirical Exploration in Applications to Adaptively Randomized
Experiments [11.464963616709671]
Multi-armed bandit algorithms have been argued for decades as useful for adaptively randomized experiments.
We applied the bandit algorithm Thompson Sampling (TS) to run adaptive experiments in three university classes.
We show that collecting data with TS can as much as double the False Positive Rate (FPR) and the False Negative Rate (FNR)
arXiv Detail & Related papers (2021-03-22T22:05:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.