Statistical Inference After Adaptive Sampling for Longitudinal Data
- URL: http://arxiv.org/abs/2202.07098v5
- Date: Wed, 19 Apr 2023 04:20:04 GMT
- Title: Statistical Inference After Adaptive Sampling for Longitudinal Data
- Authors: Kelly W. Zhang, Lucas Janson, Susan A. Murphy
- Abstract summary: We develop novel methods to perform a variety of statistical analyses on adaptively sampled data via Z-estimation.
We develop novel theoretical tools for empirical processes on non-i.i.d., adaptively sampled longitudinal data which may be of independent interest.
- Score: 9.468593929311867
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Online reinforcement learning and other adaptive sampling algorithms are
increasingly used in digital intervention experiments to optimize treatment
delivery for users over time. In this work, we focus on longitudinal user data
collected by a large class of adaptive sampling algorithms that are designed to
optimize treatment decisions online using accruing data from multiple users.
Combining or "pooling" data across users allows adaptive sampling algorithms to
potentially learn faster. However, by pooling, these algorithms induce
dependence between the sampled user data trajectories; we show that this can
cause standard variance estimators for i.i.d. data to underestimate the true
variance of common estimators on this data type. We develop novel methods to
perform a variety of statistical analyses on such adaptively sampled data via
Z-estimation. Specifically, we introduce the \textit{adaptive} sandwich
variance estimator, a corrected sandwich estimator that leads to consistent
variance estimates under adaptive sampling. Additionally, to prove our results
we develop novel theoretical tools for empirical processes on non-i.i.d.,
adaptively sampled longitudinal data which may be of independent interest. This
work is motivated by our efforts in designing experiments in which online
reinforcement learning algorithms optimize treatment decisions, yet statistical
inference is essential for conducting analyses after experiments conclude.
Related papers
- Adaptive Data Optimization: Dynamic Sample Selection with Scaling Laws [59.03420759554073]
We introduce Adaptive Data Optimization (ADO), an algorithm that optimize data distributions in an online fashion, concurrent with model training.
ADO does not require external knowledge, proxy models, or modifications to the model update.
ADO uses per-domain scaling laws to estimate the learning potential of each domain during training and adjusts the data mixture accordingly.
arXiv Detail & Related papers (2024-10-15T17:47:44Z) - Dataset Quantization with Active Learning based Adaptive Sampling [11.157462442942775]
We show that maintaining performance is feasible even with uneven sample distributions.
We propose a novel active learning based adaptive sampling strategy to optimize the sample selection.
Our approach outperforms the state-of-the-art dataset compression methods.
arXiv Detail & Related papers (2024-07-09T23:09:18Z) - Globally-Optimal Greedy Experiment Selection for Active Sequential
Estimation [1.1530723302736279]
We study the problem of active sequential estimation, which involves adaptively selecting experiments for sequentially collected data.
The goal is to design experiment selection rules for more accurate model estimation.
We propose a class of greedy experiment selection methods and provide statistical analysis for the maximum likelihood.
arXiv Detail & Related papers (2024-02-13T17:09:29Z) - Adaptive Instrument Design for Indirect Experiments [48.815194906471405]
Unlike RCTs, indirect experiments estimate treatment effects by leveragingconditional instrumental variables.
In this paper we take the initial steps towards enhancing sample efficiency for indirect experiments by adaptively designing a data collection policy.
Our main contribution is a practical computational procedure that utilizes influence functions to search for an optimal data collection policy.
arXiv Detail & Related papers (2023-12-05T02:38:04Z) - Optimal Sample Selection Through Uncertainty Estimation and Its
Application in Deep Learning [22.410220040736235]
We present a theoretically optimal solution for addressing both coreset selection and active learning.
Our proposed method, COPS, is designed to minimize the expected loss of a model trained on subsampled data.
arXiv Detail & Related papers (2023-09-05T14:06:33Z) - ASPEST: Bridging the Gap Between Active Learning and Selective
Prediction [56.001808843574395]
Selective prediction aims to learn a reliable model that abstains from making predictions when uncertain.
Active learning aims to lower the overall labeling effort, and hence human dependence, by querying the most informative examples.
In this work, we introduce a new learning paradigm, active selective prediction, which aims to query more informative samples from the shifted target domain.
arXiv Detail & Related papers (2023-04-07T23:51:07Z) - Optimal Sampling Designs for Multi-dimensional Streaming Time Series
with Application to Power Grid Sensor Data [4.891140022708977]
We study the data-dependent sample selection and online inference problem for a multi-dimensional streaming time series.
Inspired by D-optimality criterion in design of experiments, we propose a class of online data reduction methods.
We show that the optimal solution amounts to a strategy that is a mixture of Bernoulli sampling and leverage score sampling.
arXiv Detail & Related papers (2023-03-14T21:26:30Z) - Reinforced Approximate Exploratory Data Analysis [7.974685452145769]
We are first to consider the impact of sampling in interactive data exploration settings as they introduce approximation errors.
We propose a Deep Reinforcement Learning (DRL) based framework which can optimize the sample selection in order to keep the analysis and insight generation flow intact.
arXiv Detail & Related papers (2022-12-12T20:20:22Z) - Invariance Learning in Deep Neural Networks with Differentiable Laplace
Approximations [76.82124752950148]
We develop a convenient gradient-based method for selecting the data augmentation.
We use a differentiable Kronecker-factored Laplace approximation to the marginal likelihood as our objective.
arXiv Detail & Related papers (2022-02-22T02:51:11Z) - Straggler-Resilient Federated Learning: Leveraging the Interplay Between
Statistical Accuracy and System Heterogeneity [57.275753974812666]
Federated learning involves learning from data samples distributed across a network of clients while the data remains local.
In this paper, we propose a novel straggler-resilient federated learning method that incorporates statistical characteristics of the clients' data to adaptively select the clients in order to speed up the learning procedure.
arXiv Detail & Related papers (2020-12-28T19:21:14Z) - Optimal Importance Sampling for Federated Learning [57.14673504239551]
Federated learning involves a mixture of centralized and decentralized processing tasks.
The sampling of both agents and data is generally uniform; however, in this work we consider non-uniform sampling.
We derive optimal importance sampling strategies for both agent and data selection and show that non-uniform sampling without replacement improves the performance of the original FedAvg algorithm.
arXiv Detail & Related papers (2020-10-26T14:15:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.