Inference for relative sparsity
- URL: http://arxiv.org/abs/2306.14297v1
- Date: Sun, 25 Jun 2023 17:14:45 GMT
- Title: Inference for relative sparsity
- Authors: Samuel J. Weisenthal, Sally W. Thurston, Ashkan Ertefaie
- Abstract summary: We develop inference for the relative sparsity objective function, because characterizing uncertainty is crucial to applications in medicine.
Inference is difficult, because the relative sparsity objective depends on the unpenalized value function, which is unstable and has infinite estimands in the binary action case.
To tackle these issues, we nest a weighted Trust Region Policy Optimization function within a relative sparsity objective, implement an adaptive relative sparsity penalty, and propose a sample-splitting framework for post-selection inference.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In healthcare, there is much interest in estimating policies, or mappings
from covariates to treatment decisions. Recently, there is also interest in
constraining these estimated policies to the standard of care, which generated
the observed data. A relative sparsity penalty was proposed to derive policies
that have sparse, explainable differences from the standard of care,
facilitating justification of the new policy. However, the developers of this
penalty only considered estimation, not inference. Here, we develop inference
for the relative sparsity objective function, because characterizing
uncertainty is crucial to applications in medicine. Further, in the relative
sparsity work, the authors only considered the single-stage decision case;
here, we consider the more general, multi-stage case. Inference is difficult,
because the relative sparsity objective depends on the unpenalized value
function, which is unstable and has infinite estimands in the binary action
case. Further, one must deal with a non-differentiable penalty. To tackle these
issues, we nest a weighted Trust Region Policy Optimization function within a
relative sparsity objective, implement an adaptive relative sparsity penalty,
and propose a sample-splitting framework for post-selection inference. We study
the asymptotic behavior of our proposed approaches, perform extensive
simulations, and analyze a real, electronic health record dataset.
Related papers
- Policy Gradient with Active Importance Sampling [55.112959067035916]
Policy gradient (PG) methods significantly benefit from IS, enabling the effective reuse of previously collected samples.
However, IS is employed in RL as a passive tool for re-weighting historical samples.
We look for the best behavioral policy from which to collect samples to reduce the policy gradient variance.
arXiv Detail & Related papers (2024-05-09T09:08:09Z) - Policy Learning with Distributional Welfare [1.0742675209112622]
Most literature on treatment choice has considered utilitarian welfare based on the conditional average treatment effect (ATE)
This paper proposes an optimal policy that allocates the treatment based on the conditional quantile of individual treatment effects (QoTE)
arXiv Detail & Related papers (2023-11-27T14:51:30Z) - Optimal and Fair Encouragement Policy Evaluation and Learning [11.712023983596914]
We study causal identification, statistical variance-reduced estimation, and robust estimation of optimal treatment rules.
We develop a two-stage algorithm for solving over parametrized policy classes under general constraints to obtain variance-sensitive regret bounds.
arXiv Detail & Related papers (2023-09-12T20:45:30Z) - Quantile Off-Policy Evaluation via Deep Conditional Generative Learning [21.448553360543478]
Off-Policy evaluation (OPE) is concerned with evaluating a new target policy using offline data generated by a potentially different behavior policy.
We propose a doubly-robust inference procedure for quantile OPE in sequential decision making.
We demonstrate the advantages of this proposed estimator through both simulations and a real-world dataset from a short-video platform.
arXiv Detail & Related papers (2022-12-29T22:01:43Z) - Optimal Treatment Regimes for Proximal Causal Learning [7.672587258250301]
We propose a novel optimal individualized treatment regime based on outcome and treatment confounding bridges.
We show that the value function of this new optimal treatment regime is superior to that of existing ones in the literature.
arXiv Detail & Related papers (2022-12-19T14:29:25Z) - Policy Learning with Asymmetric Counterfactual Utilities [0.6138671548064356]
We consider optimal policy learning with asymmetric counterfactual utility functions.
We derive minimax decision rules by minimizing the maximum expected utility loss.
We show that one can learn minimax loss decision rules from observed data by solving intermediate classification problems.
arXiv Detail & Related papers (2022-06-21T15:44:49Z) - Variance-Aware Off-Policy Evaluation with Linear Function Approximation [85.75516599931632]
We study the off-policy evaluation problem in reinforcement learning with linear function approximation.
We propose an algorithm, VA-OPE, which uses the estimated variance of the value function to reweight the Bellman residual in Fitted Q-Iteration.
arXiv Detail & Related papers (2021-06-22T17:58:46Z) - Reliable Off-policy Evaluation for Reinforcement Learning [53.486680020852724]
In a sequential decision-making problem, off-policy evaluation estimates the expected cumulative reward of a target policy.
We propose a novel framework that provides robust and optimistic cumulative reward estimates using one or multiple logged data.
arXiv Detail & Related papers (2020-11-08T23:16:19Z) - Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic
Policies [80.42316902296832]
We study the estimation of policy value and gradient of a deterministic policy from off-policy data when actions are continuous.
In this setting, standard importance sampling and doubly robust estimators for policy value and gradient fail because the density ratio does not exist.
We propose several new doubly robust estimators based on different kernelization approaches.
arXiv Detail & Related papers (2020-06-06T15:52:05Z) - Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation [49.502277468627035]
This paper studies the statistical theory of batch data reinforcement learning with function approximation.
Consider the off-policy evaluation problem, which is to estimate the cumulative value of a new target policy from logged history.
arXiv Detail & Related papers (2020-02-21T19:20:57Z) - Confounding-Robust Policy Evaluation in Infinite-Horizon Reinforcement
Learning [70.01650994156797]
Off- evaluation of sequential decision policies from observational data is necessary in batch reinforcement learning such as education healthcare.
We develop an approach that estimates the bounds of a given policy.
We prove convergence to the sharp bounds as we collect more confounded data.
arXiv Detail & Related papers (2020-02-11T16:18:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.