Related papers: Efficient Inference after Directionally Stable Adaptive Experiments

Efficient Inference after Directionally Stable Adaptive Experiments

URL: http://arxiv.org/abs/2602.21478v1
Date: Wed, 25 Feb 2026 01:09:18 GMT
Title: Efficient Inference after Directionally Stable Adaptive Experiments
Authors: Zikai Shen, Houssam Zenati, Nathan Kallus, Arthur Gretton, Koulik Khamaru, Aurélien Bibaut,
Abstract summary: We study inference on pathwise differentiable targets after adaptive data collection, such as a bandit.<n>We introduce a novel target-specific condition, directional stability, which is strictly weaker than previously imposed target-aparametric stability conditions.
Score: 47.32051320630248
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study inference on scalar-valued pathwise differentiable targets after adaptive data collection, such as a bandit algorithm. We introduce a novel target-specific condition, directional stability, which is strictly weaker than previously imposed target-agnostic stability conditions. Under directional stability, we show that estimators that would have been efficient under i.i.d. data remain asymptotically normal and semiparametrically efficient when computed from adaptively collected trajectories. The canonical gradient has a martingale form, and directional stability guarantees stabilization of its predictable quadratic variation, enabling high-dimensional asymptotic normality. We characterize efficiency using a convolution theorem for the adaptive-data setting, and give a condition under which the one-step estimator attains the efficiency bound. We verify directional stability for LinUCB, yielding the first semiparametric efficiency guarantee for a regular scalar target under LinUCB sampling.

Related papers

Not All Preferences Are Created Equal: Stability-Aware and Gradient-Efficient Alignment for Reasoning Models [52.48582333951919]
We propose a dynamic framework designed to enhance alignment reliability by maximizing the Signal-to-Noise Ratio of policy updates.<n>SAGE (Stability-Aware Gradient Efficiency) integrates a coarse-grained curriculum mechanism that refreshes candidate pools based on model competence.<n> Experiments on multiple mathematical reasoning benchmarks demonstrate that SAGE significantly accelerates convergence and outperforms static baselines.
arXiv Detail & Related papers (2026-02-01T12:56:10Z)
Bounded Hyperbolic Tangent: A Stable and Efficient Alternative to Pre-Layer Normalization in Large Language Models [20.802982614533615]
We propose Bounded Hyperbolic Tanh (BHyT) as a drop-in replacement for Pre-LN.<n>BHyT couples a tanh nonlinearity with explicit, data-driven input bounding to keep activations within a non-saturating range.<n>It achieves an average of 15.8% faster training and an average of 4.2% higher token generation throughput compared to RMSNorm.
arXiv Detail & Related papers (2025-12-26T06:22:13Z)
Statistical Inference under Adaptive Sampling with LinUCB [15.167069362020426]
We show that the linear upper confidence bound (LinUCB) algorithm for linear bandits satisfies a property called stability.<n>We establish a central limit theorem for the LinUCB algorithm, establishing normality for the limiting distribution of the estimation error.
arXiv Detail & Related papers (2025-11-28T21:48:18Z)
Kernel Treatment Effects with Adaptively Collected Data [23.3862001690226]
We present the first kernel-based inference framework for distributional inference under adaptive data collection.<n>Our method combines doubly robust scores with variance stabilization to ensure normality via a Hilbert-space martingale CLT.<n>Experiments show it is well and effective for both mean shifts and higher-moment differences.
arXiv Detail & Related papers (2025-10-11T15:01:21Z)
Efficient Adaptive Experimentation with Noncompliance [37.85201197349216]
We study the problem of estimating the average treatment effect (ATE) in adaptive experiments where treatment can only be encouraged via a binary instrumental variable.<n>Building on semiparametric efficiency theory, we derive the efficiency bound for ATE estimation under arbitrary, history-dependent instrument-assignment policies.<n>We show it is minimized by a variance-aware allocation rule that balances outcome noise and compliance variability.
arXiv Detail & Related papers (2025-05-23T04:49:14Z)
Statistical Inference for Temporal Difference Learning with Linear Function Approximation [55.80276145563105]
We investigate the statistical properties of Temporal Difference learning with Polyak-Ruppert averaging.<n>We make three theoretical contributions that improve upon the current state-of-the-art results.
arXiv Detail & Related papers (2024-10-21T15:34:44Z)
Trust-Region Sequential Quadratic Programming for Stochastic Optimization with Random Models [57.52124921268249]
We propose a Trust Sequential Quadratic Programming method to find both first and second-order stationary points. To converge to first-order stationary points, our method computes a gradient step in each iteration defined by minimizing a approximation of the objective subject. To converge to second-order stationary points, our method additionally computes an eigen step to explore the negative curvature the reduced Hessian matrix.
arXiv Detail & Related papers (2024-09-24T04:39:47Z)
Integrated path stability selection [5.263910852465186]
We introduce a novel approach to stability selection based on integrating stability paths rather than maximizing over them.<n>This yields upper bounds on E(FP) that are much stronger than previous bounds, leading to significantly more true positives in practice for the same target E(FP)
arXiv Detail & Related papers (2024-03-23T15:55:52Z)
Fully Stochastic Trust-Region Sequential Quadratic Programming for Equality-Constrained Optimization Problems [62.83783246648714]
We propose a sequential quadratic programming algorithm (TR-StoSQP) to solve nonlinear optimization problems with objectives and deterministic equality constraints. The algorithm adaptively selects the trust-region radius and, compared to the existing line-search StoSQP schemes, allows us to utilize indefinite Hessian matrices.
arXiv Detail & Related papers (2022-11-29T05:52:17Z)
On the Convergence of Stochastic Extragradient for Bilinear Games with Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence. We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z)
Fine-Grained Analysis of Stability and Generalization for Stochastic Gradient Descent [55.85456985750134]
We introduce a new stability measure called on-average model stability, for which we develop novel bounds controlled by the risks of SGD iterates. This yields generalization bounds depending on the behavior of the best model, and leads to the first-ever-known fast bounds in the low-noise setting. To our best knowledge, this gives the firstever-known stability and generalization for SGD with even non-differentiable loss functions.
arXiv Detail & Related papers (2020-06-15T06:30:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.