Related papers: Designing Time Series Experiments in A/B Testing with Transformer Reinforcement Learning

Designing Time Series Experiments in A/B Testing with Transformer Reinforcement Learning

URL: http://arxiv.org/abs/2602.01853v1
Date: Mon, 02 Feb 2026 09:27:51 GMT
Title: Designing Time Series Experiments in A/B Testing with Transformer Reinforcement Learning
Authors: Xiangkun Wu, Qianglin Wen, Yingying Zhang, Hongtu Zhu, Ting Li, Chengchun Shi,
Abstract summary: A/B testing has become a gold standard for modern technological companies to conduct policy evaluation.<n>Yet, its application to time series experiments, where policies are sequentially assigned over time, remains challenging.<n>Existing designs suffer from two limitations: (i) they do not fully leverage the entire history for treatment allocation; (ii) they rely on strong assumptions to approximate the objective function.<n>We first establish an impossibility theorem showing that failure to condition on the full history leads to suboptimal designs, due to the dynamic dependencies in time series experiments.
Score: 28.08116749188554
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A/B testing has become a gold standard for modern technological companies to conduct policy evaluation. Yet, its application to time series experiments, where policies are sequentially assigned over time, remains challenging. Existing designs suffer from two limitations: (i) they do not fully leverage the entire history for treatment allocation; (ii) they rely on strong assumptions to approximate the objective function (e.g., the mean squared error of the estimated treatment effect) for optimizing the design. We first establish an impossibility theorem showing that failure to condition on the full history leads to suboptimal designs, due to the dynamic dependencies in time series experiments. To address both limitations simultaneously, we next propose a transformer reinforcement learning (RL) approach which leverages transformers to condition allocation on the entire history and employs RL to directly optimize the MSE without relying on restrictive assumptions. Empirical evaluations on synthetic data, a publicly available dispatch simulator, and a real-world ridesharing dataset demonstrate that our proposal consistently outperforms existing designs.

Related papers

CausalCompass: Evaluating the Robustness of Time-Series Causal Discovery in Misspecified Scenarios [17.11442807888366]
Causal is a benchmark suite designed to assess the robustness of time-series causal discovery methods under violations of modeling assumptions.<n>We conduct extensive benchmarking of representative TSCD algorithms across eight assumption-violation scenarios.<n>The methods exhibiting superior overall performance across diverse scenarios are almost deep learning-based approaches.
arXiv Detail & Related papers (2026-02-08T11:27:06Z)
In-Context Reinforcement Learning From Suboptimal Historical Data [56.60512975858003]
Transformer models have achieved remarkable empirical successes, largely due to their in-context learning capabilities.<n>We propose the Decision Importance Transformer framework, which emulates the actor-critic algorithm in an in-context manner.<n>Our results show that DIT achieves superior performance, particularly when the offline dataset contains suboptimal historical data.
arXiv Detail & Related papers (2026-01-27T23:13:06Z)
Forecasting in Offline Reinforcement Learning for Non-stationary Environments [23.889016600249295]
We introduce Forecasting in Non-stationary Offline RL (FORL), a framework that unifies conditional diffusion-based candidate state generation.<n>FORL targets environments prone to unexpected, potentially non-Markovian offsets, requiring robust agent performance from the onset of each episode.<n> Empirical evaluations on offline RL benchmarks, augmented with real-world time-series data, demonstrate that FORL consistently improves performance compared to competitive baselines.
arXiv Detail & Related papers (2025-12-01T18:45:05Z)
ProCause: Generating Counterfactual Outcomes to Evaluate Prescriptive Process Monitoring Methods [2.4010681808413397]
Prescriptive Process Monitoring (PresPM) focuses on optimizing processes through real-time interventions based on event log data.<n> evaluating PresPM methods is challenging due to the lack of ground-truth outcomes for all intervention actions in datasets.<n>We introduce ProCause, a generative approach that supports both sequential and non-sequential models.
arXiv Detail & Related papers (2025-08-31T10:54:43Z)
Backpropagation-Free Test-Time Adaptation via Probabilistic Gaussian Alignment [16.352863226512984]
Test-time adaptation (TTA) enhances the zero-shot robustness under distribution shifts by leveraging unlabeled test data during inference.<n>Most methods rely on backpropagation or iterative optimization, which limits scalability and hinders real-time deployment.<n>We propose ADAPT, an Advanced Distribution-Aware and back propagation-free Test-time adaptation method.
arXiv Detail & Related papers (2025-08-21T13:42:49Z)
Prediction-Powered Causal Inferences [59.98498488132307]
We focus on Prediction-Powered Causal Inferences (PPCI)<n>We first show that conditional calibration guarantees valid PPCI at population level.<n>We then introduce a sufficient representation constraint transferring validity across experiments.
arXiv Detail & Related papers (2025-02-10T10:52:17Z)
On Mesa-Optimization in Autoregressively Trained Transformers: Emergence and Capability [34.43255978863601]
Several suggest that transformers learn a mesa-optimizer during autorere training. We show that a stronger assumption related to the moments of data is the sufficient necessary condition that the learned mesa-optimizer can perform.
arXiv Detail & Related papers (2024-05-27T05:41:06Z)
Latent Plan Transformer for Trajectory Abstraction: Planning as Latent Space Inference [53.419249906014194]
We study generative modeling for planning with datasets repurposed from offline reinforcement learning.<n>We introduce the Latent Plan Transformer (), a novel model that leverages a latent variable to connect a Transformer-based trajectory generator and the final return.
arXiv Detail & Related papers (2024-02-07T08:18:09Z)
When Demonstrations Meet Generative World Models: A Maximum Likelihood Framework for Offline Inverse Reinforcement Learning [62.00672284480755]
This paper aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent. Accurate models of expertise in executing a task has applications in safety-sensitive applications such as clinical decision making and autonomous driving.
arXiv Detail & Related papers (2023-02-15T04:14:20Z)
DELTA: degradation-free fully test-time adaptation [59.74287982885375]
We find that two unfavorable defects are concealed in the prevalent adaptation methodologies like test-time batch normalization (BN) and self-learning. First, we reveal that the normalization statistics in test-time BN are completely affected by the currently received test samples, resulting in inaccurate estimates. Second, we show that during test-time adaptation, the parameter update is biased towards some dominant classes.
arXiv Detail & Related papers (2023-01-30T15:54:00Z)
Towards Standardizing Reinforcement Learning Approaches for Stochastic Production Scheduling [77.34726150561087]
reinforcement learning can be used to solve scheduling problems. Existing studies rely on (sometimes) complex simulations for which the code is unavailable. There is a vast array of RL designs to choose from. standardization of model descriptions - both production setup and RL design - and validation scheme are a prerequisite.
arXiv Detail & Related papers (2021-04-16T16:07:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.