The Adaptive Doubly Robust Estimator for Policy Evaluation in Adaptive
Experiments and a Paradox Concerning Logging Policy
- URL: http://arxiv.org/abs/2010.03792v5
- Date: Fri, 18 Jun 2021 22:17:48 GMT
- Title: The Adaptive Doubly Robust Estimator for Policy Evaluation in Adaptive
Experiments and a Paradox Concerning Logging Policy
- Authors: Masahiro Kato and Shota Yasui and Kenichiro McAlinn
- Abstract summary: We propose a doubly robust (DR) estimator for dependent samples obtained from adaptive experiments.
We report an empirical paradox that our proposed DR estimator tends to show better performances compared to other estimators.
- Score: 13.772109618082382
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The doubly robust (DR) estimator, which consists of two nuisance parameters,
the conditional mean outcome and the logging policy (the probability of
choosing an action), is crucial in causal inference. This paper proposes a DR
estimator for dependent samples obtained from adaptive experiments. To obtain
an asymptotically normal semiparametric estimator from dependent samples with
non-Donsker nuisance estimators, we propose adaptive-fitting as a variant of
sample-splitting. We also report an empirical paradox that our proposed DR
estimator tends to show better performances compared to other estimators
utilizing the true logging policy. While a similar phenomenon is known for
estimators with i.i.d. samples, traditional explanations based on asymptotic
efficiency cannot elucidate our case with dependent samples. We confirm this
hypothesis through simulation studies.
Related papers
- Mitigating LLM Hallucinations via Conformal Abstention [70.83870602967625]
We develop a principled procedure for determining when a large language model should abstain from responding in a general domain.
We leverage conformal prediction techniques to develop an abstention procedure that benefits from rigorous theoretical guarantees on the hallucination rate (error rate)
Experimentally, our resulting conformal abstention method reliably bounds the hallucination rate on various closed-book, open-domain generative question answering datasets.
arXiv Detail & Related papers (2024-04-04T11:32:03Z) - Prognostic Covariate Adjustment for Logistic Regression in Randomized
Controlled Trials [1.5020330976600735]
We show that prognostic score adjustment can increase the power of the Wald test for the conditional odds ratio under a fixed sample size.
We utilize g-computation to expand the scope of prognostic score adjustment to inferences on the marginal risk difference, relative risk, and odds ratio estimands.
arXiv Detail & Related papers (2024-02-29T06:53:16Z) - High Precision Causal Model Evaluation with Conditional Randomization [10.23470075454725]
We introduce a novel low-variance estimator for causal error, dubbed as the pairs estimator.
By applying the same IPW estimator to both the model and true experimental effects, our estimator effectively cancels out the variance due to IPW and achieves a smaller variance.
Our method offers a simple yet powerful solution to evaluate causal inference models in conditional randomization settings without complicated modification of the IPW estimator itself.
arXiv Detail & Related papers (2023-11-03T13:22:27Z) - Detecting Adversarial Data by Probing Multiple Perturbations Using
Expected Perturbation Score [62.54911162109439]
Adversarial detection aims to determine whether a given sample is an adversarial one based on the discrepancy between natural and adversarial distributions.
We propose a new statistic called expected perturbation score (EPS), which is essentially the expected score of a sample after various perturbations.
We develop EPS-based maximum mean discrepancy (MMD) as a metric to measure the discrepancy between the test sample and natural samples.
arXiv Detail & Related papers (2023-05-25T13:14:58Z) - Improved Policy Evaluation for Randomized Trials of Algorithmic Resource
Allocation [54.72195809248172]
We present a new estimator leveraging our proposed novel concept, that involves retrospective reshuffling of participants across experimental arms at the end of an RCT.
We prove theoretically that such an estimator is more accurate than common estimators based on sample means.
arXiv Detail & Related papers (2023-02-06T05:17:22Z) - Near-optimal inference in adaptive linear regression [60.08422051718195]
Even simple methods like least squares can exhibit non-normal behavior when data is collected in an adaptive manner.
We propose a family of online debiasing estimators to correct these distributional anomalies in at least squares estimation.
We demonstrate the usefulness of our theory via applications to multi-armed bandit, autoregressive time series estimation, and active learning with exploration.
arXiv Detail & Related papers (2021-07-05T21:05:11Z) - Off-Policy Evaluation via Adaptive Weighting with Data from Contextual
Bandits [5.144809478361604]
We improve the doubly robust (DR) estimator by adaptively weighting observations to control its variance.
We provide empirical evidence for our estimator's improved accuracy and inferential properties relative to existing alternatives.
arXiv Detail & Related papers (2021-06-03T17:54:44Z) - Counterfactual Inference of the Mean Outcome under a Convergence of
Average Logging Probability [5.596752018167751]
This paper considers estimating the mean outcome of an action from samples obtained in adaptive experiments.
In adaptive experiments, the probability of choosing an action is allowed to be sequentially updated based on past observations.
arXiv Detail & Related papers (2021-02-17T19:05:53Z) - Optimal Off-Policy Evaluation from Multiple Logging Policies [77.62012545592233]
We study off-policy evaluation from multiple logging policies, each generating a dataset of fixed size, i.e., stratified sampling.
We find the OPE estimator for multiple loggers with minimum variance for any instance, i.e., the efficient one.
arXiv Detail & Related papers (2020-10-21T13:43:48Z) - Estimating Gradients for Discrete Random Variables by Sampling without
Replacement [93.09326095997336]
We derive an unbiased estimator for expectations over discrete random variables based on sampling without replacement.
We show that our estimator can be derived as the Rao-Blackwellization of three different estimators.
arXiv Detail & Related papers (2020-02-14T14:15:18Z) - Efficient Adaptive Experimental Design for Average Treatment Effect
Estimation [18.027128141189355]
We propose an algorithm for efficient experiments with estimators constructed from dependent samples.
To justify our proposed approach, we provide finite and infinite sample analyses.
arXiv Detail & Related papers (2020-02-13T02:04:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.