Adaptive Experimentation in the Presence of Exogenous Nonstationary
Variation
- URL: http://arxiv.org/abs/2202.09036v4
- Date: Sat, 26 Aug 2023 16:02:42 GMT
- Title: Adaptive Experimentation in the Presence of Exogenous Nonstationary
Variation
- Authors: Chao Qin and Daniel Russo
- Abstract summary: Multi-armed bandit algorithms can enhance efficiency by dynamically allocating measurement effort towards higher performing arms.
We propose deconfounded Thompson sampling (DTS), a more robust variant of the prominent Thompson sampling algorithm.
We show that a deconfounded variant of the popular upper confidence bound algorithm can fail completely.
- Score: 10.66863856524397
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We investigate experiments that are designed to select a treatment arm for
population deployment. Multi-armed bandit algorithms can enhance efficiency by
dynamically allocating measurement effort towards higher performing arms based
on observed feedback. However, such dynamics can result in brittle behavior in
the face of nonstationary exogenous factors influencing arms' performance
during the experiment. To counter this, we propose deconfounded Thompson
sampling (DTS), a more robust variant of the prominent Thompson sampling
algorithm. As observations accumulate, DTS projects the population-level
performance of an arm while controlling for the context within which observed
treatment decisions were made. Contexts here might capture a comprehensible
source of variation, such as the country of a treated individual, or simply
record the time of treatment. We provide bounds on both within-experiment and
post-experiment regret of DTS, illustrating its resilience to exogenous
variation and the delicate balance it strikes between exploration and
exploitation. Our proofs leverage inverse propensity weights to analyze the
evolution of the posterior distribution, a departure from established methods
in the literature. Hinting that new understanding is indeed necessary, we show
that a deconfounded variant of the popular upper confidence bound algorithm can
fail completely.
Related papers
- GTS: Inference-Time Scaling of Latent Reasoning with a Learnable Gaussian Thought Sampler [54.10960908347221]
We model latent thought exploration as conditional sampling from learnable densities and instantiate this idea as a Gaussian Thought Sampler (GTS)<n>GTS predicts context-dependent perturbation distributions over continuous reasoning states and is trained with GRPO-style policy optimization while keeping the backbone frozen.
arXiv Detail & Related papers (2026-02-15T09:57:47Z) - Analyzing and Improving Diffusion Models for Time-Series Data Imputation: A Proximal Recursion Perspective [45.713195454899875]
Diffusion models (DMs) have shown promise for Time-Series Data Imputation.<n>DMs' performance remains inconsistent in complex scenarios.<n>We propose a novel framework called SPIRIT (Semi-Proximal Transport Regularized time-series Imputation)
arXiv Detail & Related papers (2026-02-01T12:11:57Z) - Power Constrained Nonstationary Bandits with Habituation and Recovery Dynamics [0.9699640804685629]
This paper develops a Thompson Sampling algorithm tailored to the ROGUE framework.<n>We then introduce a probability clipping procedure to balance personalization and population-level learning.<n>For researchers designing micro-randomized trials, our framework offers practical guidance on balancing personalization with statistical validity.
arXiv Detail & Related papers (2025-11-04T19:46:42Z) - DFW: A Novel Weighting Scheme for Covariate Balancing and Treatment Effect Estimation [0.0]
Estimating causal effects from observational data is challenging due to selection bias.<n>We propose Deconfounding Factor Weighting (DFW), a novel propensity score-based approach.<n>DFW prioritizes less confounded samples while mitigating the influence of highly confounded ones.
arXiv Detail & Related papers (2025-08-07T09:51:55Z) - A Generative Framework for Causal Estimation via Importance-Weighted Diffusion Distillation [55.53426007439564]
Estimating individualized treatment effects from observational data is a central challenge in causal inference.<n>In inverse probability weighting (IPW) is a well-established solution to this problem, but its integration into modern deep learning frameworks remains limited.<n>We propose Importance-Weighted Diffusion Distillation (IWDD), a novel generative framework that combines the pretraining of diffusion models with importance-weighted score distillation.
arXiv Detail & Related papers (2025-05-16T17:00:52Z) - Power-scaled Bayesian Inference with Score-based Generative Models [0.22499166814992438]
We propose a score-based generative algorithm for sampling from power-scaled priors and likelihoods within the Bayesian inference framework.
Specifically, we focus on seismic velocity models conditioned on imaged seismic.
arXiv Detail & Related papers (2025-04-15T02:06:04Z) - Amortized Posterior Sampling with Diffusion Prior Distillation [55.03585818289934]
Amortized Posterior Sampling is a novel variational inference approach for efficient posterior sampling in inverse problems.<n>Our method trains a conditional flow model to minimize the divergence between the variational distribution and the posterior distribution implicitly defined by the diffusion model.<n>Unlike existing methods, our approach is unsupervised, requires no paired training data, and is applicable to both Euclidean and non-Euclidean domains.
arXiv Detail & Related papers (2024-07-25T09:53:12Z) - Differentiable Pareto-Smoothed Weighting for High-Dimensional Heterogeneous Treatment Effect Estimation [0.6906005491572401]
We develop a numerically robust estimator by weighted representation learning.
Our experimental results show that by effectively correcting the weight values, our proposed method outperforms the existing ones.
arXiv Detail & Related papers (2024-04-26T15:34:04Z) - Towards Understanding the Robustness of Diffusion-Based Purification: A Stochastic Perspective [65.10019978876863]
Diffusion-Based Purification (DBP) has emerged as an effective defense mechanism against adversarial attacks.
In this paper, we propose that the intrinsicity in the DBP process is the primary factor driving robustness.
arXiv Detail & Related papers (2024-04-22T16:10:38Z) - Undersampling and Cumulative Class Re-decision Methods to Improve
Detection of Agitation in People with Dementia [16.949993123698345]
Agitation is one of the most prevalent symptoms in people with dementia (PwD)
In a previous study, we collected multimodal wearable sensor data from 17 participants for 600 days and developed machine learning models for detecting agitation in one-minute windows.
In this paper, we first implemented different undersampling methods to eliminate the imbalance problem, and came to the conclusion that only 20% of normal behaviour data were adequate to train a competitive agitation detection model.
arXiv Detail & Related papers (2023-02-07T03:14:00Z) - Improved Policy Evaluation for Randomized Trials of Algorithmic Resource
Allocation [54.72195809248172]
We present a new estimator leveraging our proposed novel concept, that involves retrospective reshuffling of participants across experimental arms at the end of an RCT.
We prove theoretically that such an estimator is more accurate than common estimators based on sample means.
arXiv Detail & Related papers (2023-02-06T05:17:22Z) - Training Discrete Deep Generative Models via Gapped Straight-Through
Estimator [72.71398034617607]
We propose a Gapped Straight-Through ( GST) estimator to reduce the variance without incurring resampling overhead.
This estimator is inspired by the essential properties of Straight-Through Gumbel-Softmax.
Experiments demonstrate that the proposed GST estimator enjoys better performance compared to strong baselines on two discrete deep generative modeling tasks.
arXiv Detail & Related papers (2022-06-15T01:46:05Z) - Sample Efficient Deep Reinforcement Learning via Uncertainty Estimation [12.415463205960156]
In model-free deep reinforcement learning (RL) algorithms, using noisy value estimates to supervise policy evaluation and optimization is detrimental to the sample efficiency.
We provide a systematic analysis of the sources of uncertainty in the noisy supervision that occurs in RL.
We propose a method whereby two complementary uncertainty estimation methods account for both the Q-value and the environmentity to better mitigate the negative impacts of noisy supervision.
arXiv Detail & Related papers (2022-01-05T15:46:06Z) - Assessment of Treatment Effect Estimators for Heavy-Tailed Data [70.72363097550483]
A central obstacle in the objective assessment of treatment effect (TE) estimators in randomized control trials (RCTs) is the lack of ground truth (or validation set) to test their performance.
We provide a novel cross-validation-like methodology to address this challenge.
We evaluate our methodology across 709 RCTs implemented in the Amazon supply chain.
arXiv Detail & Related papers (2021-12-14T17:53:01Z) - Provable RL with Exogenous Distractors via Multistep Inverse Dynamics [85.52408288789164]
Real-world applications of reinforcement learning (RL) require the agent to deal with high-dimensional observations such as those generated from a megapixel camera.
Prior work has addressed such problems with representation learning, through which the agent can provably extract endogenous, latent state information from raw observations.
However, such approaches can fail in the presence of temporally correlated noise in the observations.
arXiv Detail & Related papers (2021-10-17T15:21:27Z) - Sampling-free Variational Inference for Neural Networks with
Multiplicative Activation Noise [51.080620762639434]
We propose a more efficient parameterization of the posterior approximation for sampling-free variational inference.
Our approach yields competitive results for standard regression problems and scales well to large-scale image classification tasks.
arXiv Detail & Related papers (2021-03-15T16:16:18Z) - Doubly-Adaptive Thompson Sampling for Multi-Armed and Contextual Bandits [28.504921333436833]
We propose a variant of a Thompson sampling based algorithm that adaptively re-weighs the terms of a doubly robust estimator on the true mean reward of each arm.
The proposed algorithm matches the optimal (minimax) regret rate and its empirical evaluation in a semi-synthetic experiment.
We extend this approach to contextual bandits, where there are more sources of bias present apart from the adaptive data collection.
arXiv Detail & Related papers (2021-02-25T22:29:25Z) - Weak Signal Asymptotics for Sequentially Randomized Experiments [2.28438857884398]
We study a class of sequentially randomized experiments, including those that arise in solving multi-armed bandit problems.
We show that the sample paths of a class of sequentially randomized experiments converge weakly to a diffusion limit.
We show that all sequential experiments whose randomization probabilities have a Lipschitz-continuous dependence on the observed data suffer from sub-optimal regret when reward gaps are relatively large.
arXiv Detail & Related papers (2021-01-25T02:20:20Z) - Efficient Empowerment Estimation for Unsupervised Stabilization [75.32013242448151]
empowerment principle enables unsupervised stabilization of dynamical systems at upright positions.
We propose an alternative solution based on a trainable representation of a dynamical system as a Gaussian channel.
We show that our method has a lower sample complexity, is more stable in training, possesses the essential properties of the empowerment function, and allows estimation of empowerment from images.
arXiv Detail & Related papers (2020-07-14T21:10:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.