Related papers: Experimenting, Fast and Slow: Bayesian Optimization of Long-term Outcomes with Online Experiments

Experimenting, Fast and Slow: Bayesian Optimization of Long-term Outcomes with Online Experiments

URL: http://arxiv.org/abs/2506.18744v2
Date: Mon, 30 Jun 2025 16:42:40 GMT
Title: Experimenting, Fast and Slow: Bayesian Optimization of Long-term Outcomes with Online Experiments
Authors: Qing Feng, Samuel Daulton, Benjamin Letham, Maximilian Balandat, Eytan Bakshy,
Abstract summary: Decision-makers wish to optimize for long-term treatment effects of the system changes.<n>We describe a novel approach that combines fast experiments (e.g., biased experiments run only for a few hours or days) with long-running, slow experiments.
Score: 18.721012607370977
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Online experiments in internet systems, also known as A/B tests, are used for a wide range of system tuning problems, such as optimizing recommender system ranking policies and learning adaptive streaming controllers. Decision-makers generally wish to optimize for long-term treatment effects of the system changes, which often requires running experiments for a long time as short-term measurements can be misleading due to non-stationarity in treatment effects over time. The sequential experimentation strategies--which typically involve several iterations--can be prohibitively long in such cases. We describe a novel approach that combines fast experiments (e.g., biased experiments run only for a few hours or days) and/or offline proxies (e.g., off-policy evaluation) with long-running, slow experiments to perform sequential, Bayesian optimization over large action spaces in a short amount of time.

Related papers

Pessimistic asynchronous sampling in high-cost Bayesian optimization [0.0]
Asynchronous Bayesian optimization is a technique that allows for parallel operation of experimental systems and disjointed systems. A pessimistic prediction asynchronous policy reached optimum experimental conditions in significantly fewer experiments than equivalent serial policies. Without accounting for the faster sampling rate, the pessimistic algorithm presented in this work could result in more efficient algorithm driven optimization of high-cost experimental spaces.
arXiv Detail & Related papers (2024-06-21T16:35:27Z)
Adaptive Experimentation When You Can't Experiment [55.86593195947978]
This paper introduces the emphconfounded pure exploration transductive linear bandit (textttCPET-LB) problem. Online services can employ a properly randomized encouragement that incentivizes users toward a specific treatment.
arXiv Detail & Related papers (2024-06-15T20:54:48Z)
Search Strategies for Self-driving Laboratories with Pending Experiments [4.416701099409113]
Self-driving laboratories (SDLs) consist of multiple stations that perform material synthesis and characterisation tasks. It is practical to run experiments in asynchronous parallel, in which multiple experiments are being performed at once in different stages. We build a simulator for a multi-stage SDL and compare optimisation strategies for dealing with delayed feedback and asynchronous parallelized operation.
arXiv Detail & Related papers (2023-12-06T12:41:53Z)
Adaptive Instrument Design for Indirect Experiments [48.815194906471405]
Unlike RCTs, indirect experiments estimate treatment effects by leveragingconditional instrumental variables. In this paper we take the initial steps towards enhancing sample efficiency for indirect experiments by adaptively designing a data collection policy. Our main contribution is a practical computational procedure that utilizes influence functions to search for an optimal data collection policy.
arXiv Detail & Related papers (2023-12-05T02:38:04Z)
Choosing a Proxy Metric from Past Experiments [54.338884612982405]
In many randomized experiments, the treatment effect of the long-term metric is often difficult or infeasible to measure. A common alternative is to measure several short-term proxy metrics in the hope they closely track the long-term metric. We introduce a new statistical framework to both define and construct an optimal proxy metric for use in a homogeneous population of randomized experiments.
arXiv Detail & Related papers (2023-09-14T17:43:02Z)
FuzzyFlow: Leveraging Dataflow To Find and Squash Program Optimization Bugs [92.47146416628965]
FuzzyFlow is a fault localization and test case extraction framework designed to test program optimizations. We leverage dataflow program representations to capture a fully reproducible system state and area-of-effect for optimizations. To reduce testing time, we design an algorithm for minimizing test inputs, trading off memory for recomputation.
arXiv Detail & Related papers (2023-06-28T13:00:17Z)
Practical Policy Optimization with Personalized Experimentation [7.928781593773402]
We present a personalized experimentation framework, which optimize treatment group assignment at the user level. We describe an end-to-end workflow that has proven to be successful in practice and can be readily implemented using open-source software.
arXiv Detail & Related papers (2023-03-30T18:25:11Z)
Adaptive Experimentation with Delayed Binary Feedback [11.778924435036519]
This paper presents an adaptive experimentation solution tailored for delayed binary feedback objectives. It estimates the real underlying objectives before they materialize and dynamically allocates variants based on the estimates. This solution is currently deployed in the online experimentation platform of JD.com.
arXiv Detail & Related papers (2022-02-02T01:47:10Z)
Time-varying Gaussian Process Bandit Optimization with Non-constant Evaluation Time [93.6788993843846]
We propose a novel time-varying Bayesian optimization algorithm that can effectively handle the non-constant evaluation time. Our bound elucidates that a pattern of the evaluation time sequence can hugely affect the difficulty of the problem.
arXiv Detail & Related papers (2020-03-10T13:28:33Z)
Dynamic Causal Effects Evaluation in A/B Testing with a Reinforcement Learning Framework [68.96770035057716]
A/B testing is a business strategy to compare a new product with an old one in pharmaceutical, technological, and traditional industries. This paper introduces a reinforcement learning framework for carrying A/B testing in online experiments.
arXiv Detail & Related papers (2020-02-05T10:25:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.