Doubly Robust Estimator for Off-Policy Evaluation with Large Action
Spaces
- URL: http://arxiv.org/abs/2308.03443v3
- Date: Thu, 14 Dec 2023 08:52:01 GMT
- Title: Doubly Robust Estimator for Off-Policy Evaluation with Large Action
Spaces
- Authors: Tatsuhiro Shimizu, Laura Forastiere
- Abstract summary: We study Off-Policy Evaluation in contextual bandit settings with large action spaces.
benchmark estimators suffer from severe bias and variance tradeoffs.
We propose a Marginalized Doubly Robust (MDR) estimator to overcome these limitations.
- Score: 0.951828574518325
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study Off-Policy Evaluation (OPE) in contextual bandit settings with large
action spaces. The benchmark estimators suffer from severe bias and variance
tradeoffs. Parametric approaches suffer from bias due to difficulty specifying
the correct model, whereas ones with importance weight suffer from variance. To
overcome these limitations, Marginalized Inverse Propensity Scoring (MIPS) was
proposed to mitigate the estimator's variance via embeddings of an action.
Nevertheless, MIPS is unbiased under the no direct effect, which assumes that
the action embedding completely mediates the effect of an action on a reward.
To overcome the dependency on these unrealistic assumptions, we propose a
Marginalized Doubly Robust (MDR) estimator. Theoretical analysis shows that the
proposed estimator is unbiased under weaker assumptions than MIPS while
reducing the variance against MIPS. The empirical experiment verifies the
supremacy of MDR against existing estimators with large action spaces.
Related papers
- Perturbation-Invariant Adversarial Training for Neural Ranking Models:
Improving the Effectiveness-Robustness Trade-Off [107.35833747750446]
adversarial examples can be crafted by adding imperceptible perturbations to legitimate documents.
This vulnerability raises significant concerns about their reliability and hinders the widespread deployment of NRMs.
In this study, we establish theoretical guarantees regarding the effectiveness-robustness trade-off in NRMs.
arXiv Detail & Related papers (2023-12-16T05:38:39Z) - Marginal Density Ratio for Off-Policy Evaluation in Contextual Bandits [41.91108406329159]
Off-Policy Evaluation (OPE) in contextual bandits is crucial for assessing new policies using existing data without costly experimentation.
We introduce a new OPE estimator for contextual bandits, the Marginal Ratio (MR) estimator, which focuses on the shift in the marginal distribution of outcomes $Y$ instead of the policies themselves.
arXiv Detail & Related papers (2023-12-03T17:04:57Z) - Off-Policy Evaluation for Large Action Spaces via Conjunct Effect
Modeling [30.835774920236872]
We study off-policy evaluation of contextual bandit policies for large discrete action spaces.
We propose a new estimator, called OffCEM, that is based on the conjunct effect model (CEM), a novel decomposition of the causal effect into a cluster effect and a residual effect.
Experiments demonstrate that OffCEM provides substantial improvements in OPE especially in the presence of many actions.
arXiv Detail & Related papers (2023-05-14T04:16:40Z) - Off-Policy Risk Assessment in Markov Decision Processes [15.225153671736201]
We develop the first doubly robust (DR) estimator for the CDF of returns in Markov decision processes (MDPs)
This estimator enjoys significantly less variance and, when the model is well specified, achieves the Cramer-Rao variance lower bound.
We derive the first minimax lower bounds for off-policy CDF and risk estimation, which match our error bounds up to a constant factor.
arXiv Detail & Related papers (2022-09-21T15:40:59Z) - Doubly Robust Distributionally Robust Off-Policy Evaluation and Learning [59.02006924867438]
Off-policy evaluation and learning (OPE/L) use offline observational data to make better decisions.
Recent work proposed distributionally robust OPE/L (DROPE/L) to remedy this, but the proposal relies on inverse-propensity weighting.
We propose the first DR algorithms for DROPE/L with KL-divergence uncertainty sets.
arXiv Detail & Related papers (2022-02-19T20:00:44Z) - Off-Policy Evaluation for Large Action Spaces via Embeddings [36.42838320396534]
Off-policy evaluation (OPE) in contextual bandits has seen rapid adoption in real-world systems.
Existing OPE estimators degrade severely when the number of actions is large.
We propose a new OPE estimator that leverages marginalized importance weights when action embeddings provide structure in the action space.
arXiv Detail & Related papers (2022-02-13T14:00:09Z) - Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models.
In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints.
A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z) - Tight Mutual Information Estimation With Contrastive Fenchel-Legendre
Optimization [69.07420650261649]
We introduce a novel, simple, and powerful contrastive MI estimator named as FLO.
Empirically, our FLO estimator overcomes the limitations of its predecessors and learns more efficiently.
The utility of FLO is verified using an extensive set of benchmarks, which also reveals the trade-offs in practical MI estimation.
arXiv Detail & Related papers (2021-07-02T15:20:41Z) - Enhanced Doubly Robust Learning for Debiasing Post-click Conversion Rate
Estimation [29.27760413892272]
Post-click conversion, as a strong signal indicating the user preference, is salutary for building recommender systems.
Currently, most existing methods utilize counterfactual learning to debias recommender systems.
We propose a novel double learning approach for the MRDR estimator, which can convert the error imputation into the general CVR estimation.
arXiv Detail & Related papers (2021-05-28T06:59:49Z) - Nonparametric Estimation of the Fisher Information and Its Applications [82.00720226775964]
This paper considers the problem of estimation of the Fisher information for location from a random sample of size $n$.
An estimator proposed by Bhattacharya is revisited and improved convergence rates are derived.
A new estimator, termed a clipped estimator, is proposed.
arXiv Detail & Related papers (2020-05-07T17:21:56Z) - Machine learning for causal inference: on the use of cross-fit
estimators [77.34726150561087]
Doubly-robust cross-fit estimators have been proposed to yield better statistical properties.
We conducted a simulation study to assess the performance of several estimators for the average causal effect (ACE)
When used with machine learning, the doubly-robust cross-fit estimators substantially outperformed all of the other estimators in terms of bias, variance, and confidence interval coverage.
arXiv Detail & Related papers (2020-04-21T23:09:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.