AExGym: Benchmarks and Environments for Adaptive Experimentation
- URL: http://arxiv.org/abs/2408.04531v1
- Date: Thu, 8 Aug 2024 15:32:12 GMT
- Title: AExGym: Benchmarks and Environments for Adaptive Experimentation
- Authors: Jimmy Wang, Ethan Che, Daniel R. Jiang, Hongseok Namkoong,
- Abstract summary: We present a benchmark for adaptive experimentation based on real-world datasets.
We highlight prominent practical challenges to operationalizing adaptivity: non-stationarity, batched/delayed feedback, multiple outcomes and objectives, and external validity.
- Score: 7.948144726705323
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Innovations across science and industry are evaluated using randomized trials (a.k.a. A/B tests). While simple and robust, such static designs are inefficient or infeasible for testing many hypotheses. Adaptive designs can greatly improve statistical power in theory, but they have seen limited adoption due to their fragility in practice. We present a benchmark for adaptive experimentation based on real-world datasets, highlighting prominent practical challenges to operationalizing adaptivity: non-stationarity, batched/delayed feedback, multiple outcomes and objectives, and external validity. Our benchmark aims to spur methodological development that puts practical performance (e.g., robustness) as a central concern, rather than mathematical guarantees on contrived instances. We release an open source library, AExGym, which is designed with modularity and extensibility in mind to allow experimentation practitioners to develop custom environments and algorithms.
Related papers
- Adaptive Experimentation When You Can't Experiment [55.86593195947978]
This paper introduces the emphconfounded pure exploration transductive linear bandit (textttCPET-LB) problem.
Online services can employ a properly randomized encouragement that incentivizes users toward a specific treatment.
arXiv Detail & Related papers (2024-06-15T20:54:48Z) - Adaptive Instrument Design for Indirect Experiments [48.815194906471405]
Unlike RCTs, indirect experiments estimate treatment effects by leveragingconditional instrumental variables.
In this paper we take the initial steps towards enhancing sample efficiency for indirect experiments by adaptively designing a data collection policy.
Our main contribution is a practical computational procedure that utilizes influence functions to search for an optimal data collection policy.
arXiv Detail & Related papers (2023-12-05T02:38:04Z) - ExPT: Synthetic Pretraining for Few-Shot Experimental Design [33.5918976228562]
Experiment Pretrained Transformers (ExPT) is a foundation model for few-shot experimental design.
ExPT employs a novel combination of synthetic pretraining with in-context learning.
We evaluate ExPT on few-shot experimental design in challenging domains.
arXiv Detail & Related papers (2023-10-30T19:25:43Z) - From Static Benchmarks to Adaptive Testing: Psychometrics in AI Evaluation [60.14902811624433]
We discuss a paradigm shift from static evaluation methods to adaptive testing.
This involves estimating the characteristics and value of each test item in the benchmark and dynamically adjusting items in real-time.
We analyze the current approaches, advantages, and underlying reasons for adopting psychometrics in AI evaluation.
arXiv Detail & Related papers (2023-06-18T09:54:33Z) - Task-specific experimental design for treatment effect estimation [59.879567967089145]
Large randomised trials (RCTs) are the standard for causal inference.
Recent work has proposed more sample-efficient alternatives to RCTs, but these are not adaptable to the downstream application for which the causal effect is sought.
We develop a task-specific approach to experimental design and derive sampling strategies customised to particular downstream applications.
arXiv Detail & Related papers (2023-06-08T18:10:37Z) - Adaptive Experimentation at Scale: A Computational Framework for
Flexible Batches [7.390918770007728]
Motivated by practical instances involving a handful of reallocations in which outcomes are measured in batches, we develop an adaptive-driven experimentation framework.
Our main observation is that normal approximations, which are universal in statistical inference, can also guide the design of adaptive algorithms.
arXiv Detail & Related papers (2023-03-21T04:17:03Z) - Adaptive Experimental Design and Counterfactual Inference [20.666734673282495]
This paper shares lessons learned regarding the challenges and pitfalls of naively using adaptive experimentation systems in industrial settings.
We developed an adaptive experimental design framework for counterfactual inference based on these experiences.
arXiv Detail & Related papers (2022-10-25T22:29:16Z) - Design Amortization for Bayesian Optimal Experimental Design [70.13948372218849]
We build off of successful variational approaches, which optimize a parameterized variational model with respect to bounds on the expected information gain (EIG)
We present a novel neural architecture that allows experimenters to optimize a single variational model that can estimate the EIG for potentially infinitely many designs.
arXiv Detail & Related papers (2022-10-07T02:12:34Z) - Benchopt: Reproducible, efficient and collaborative optimization
benchmarks [67.29240500171532]
Benchopt is a framework to automate, reproduce and publish optimization benchmarks in machine learning.
Benchopt simplifies benchmarking for the community by providing an off-the-shelf tool for running, sharing and extending experiments.
arXiv Detail & Related papers (2022-06-27T16:19:24Z) - Bayesian optimization with known experimental and design constraints for
chemistry applications [0.0]
We extend our experiment planning algorithms Phoenics and Gryffin such that they can handle arbitrary known constraints.
We illustrate their practical utility in two simulated chemical research scenarios.
arXiv Detail & Related papers (2022-03-29T22:16:54Z) - Parameter-free Online Test-time Adaptation [19.279048049267388]
We show how test-time adaptation methods fare for a number of pre-trained models on a variety of real-world scenarios.
We propose a particularly "conservative" approach, which addresses the problem with a Laplacian Adjusted Maximum Estimation (LAME)
Our approach exhibits a much higher average accuracy across scenarios than existing methods, while being notably faster and have a much lower memory footprint.
arXiv Detail & Related papers (2022-01-15T00:29:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.