Inference on Optimal Dynamic Policies via Softmax Approximation
- URL: http://arxiv.org/abs/2303.04416v3
- Date: Wed, 13 Dec 2023 23:26:48 GMT
- Title: Inference on Optimal Dynamic Policies via Softmax Approximation
- Authors: Qizhao Chen, Morgane Austern, Vasilis Syrgkanis
- Abstract summary: We show that a simple soft-max approximation to the optimal treatment regime can achieve valid inference on the truly optimal regime.
Our work combines techniques from semi-parametric inference and $g$-estimation, together with an appropriate array central limit theorem.
- Score: 27.396891119011215
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Estimating optimal dynamic policies from offline data is a fundamental
problem in dynamic decision making. In the context of causal inference, the
problem is known as estimating the optimal dynamic treatment regime. Even
though there exists a plethora of methods for estimation, constructing
confidence intervals for the value of the optimal regime and structural
parameters associated with it is inherently harder, as it involves non-linear
and non-differentiable functionals of unknown quantities that need to be
estimated. Prior work resorted to sub-sample approaches that can deteriorate
the quality of the estimate. We show that a simple soft-max approximation to
the optimal treatment regime, for an appropriately fast growing temperature
parameter, can achieve valid inference on the truly optimal regime. We
illustrate our result for a two-period optimal dynamic regime, though our
approach should directly extend to the finite horizon case. Our work combines
techniques from semi-parametric inference and $g$-estimation, together with an
appropriate triangular array central limit theorem, as well as a novel analysis
of the asymptotic influence and asymptotic bias of softmax approximations.
Related papers
- Nonparametric estimation of a covariate-adjusted counterfactual
treatment regimen response curve [2.7446241148152253]
Flexible estimation of the mean outcome under a treatment regimen is a key step toward personalized medicine.
We propose an inverse probability weighted nonparametrically efficient estimator of the smoothed regimen-response curve function.
Some finite-sample properties are explored with simulations.
arXiv Detail & Related papers (2023-09-28T01:46:24Z) - Optimal Learning via Moderate Deviations Theory [4.6930976245638245]
We develop a systematic construction of highly accurate confidence intervals by using a moderate deviation principle-based approach.
It is shown that the proposed confidence intervals are statistically optimal in the sense that they satisfy criteria regarding exponential accuracy, minimality, consistency, mischaracterization probability, and eventual uniformly most accurate (UMA) property.
arXiv Detail & Related papers (2023-05-23T19:57:57Z) - Stochastic Nonlinear Control via Finite-dimensional Spectral Dynamic
Embedding [22.946517604055735]
This paper presents an approach, Spectral Dynamics Embedding Control (SDEC), to optimal control for nonlinear systems.
We use an infinite-dimensional feature to linearly represent the state-action value function and exploits finite-dimensional truncation approximation for practical implementation.
arXiv Detail & Related papers (2023-04-08T04:23:46Z) - Off-Policy Evaluation with Policy-Dependent Optimization Response [90.28758112893054]
We develop a new framework for off-policy evaluation with a textitpolicy-dependent linear optimization response.
We construct unbiased estimators for the policy-dependent estimand by a perturbation method.
We provide a general algorithm for optimizing causal interventions.
arXiv Detail & Related papers (2022-02-25T20:25:37Z) - Understanding the Effect of Stochasticity in Policy Optimization [86.7574122154668]
We show that the preferability of optimization methods depends critically on whether exact gradients are used.
Second, to explain these findings we introduce the concept of committal rate for policy optimization.
Third, we show that in the absence of external oracle information, there is an inherent trade-off between exploiting geometry to accelerate convergence versus achieving optimality almost surely.
arXiv Detail & Related papers (2021-10-29T06:35:44Z) - Integrated Conditional Estimation-Optimization [6.037383467521294]
Many real-world optimization problems uncertain parameters with probability can be estimated using contextual feature information.
In contrast to the standard approach of estimating the distribution of uncertain parameters, we propose an integrated conditional estimation approach.
We show that our ICEO approach is theally consistent under moderate conditions.
arXiv Detail & Related papers (2021-10-24T04:49:35Z) - High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - Near Optimality of Finite Memory Feedback Policies in Partially Observed
Markov Decision Processes [0.0]
We study a planning problem for POMDPs where the system dynamics and measurement channel model is assumed to be known.
We find optimal policies for the approximate belief model under mild non-linear filter stability conditions.
We also establish a rate of convergence result which relates the finite window memory size and the approximation error bound.
arXiv Detail & Related papers (2020-10-15T00:37:51Z) - Robust, Accurate Stochastic Optimization for Variational Inference [68.83746081733464]
We show that common optimization methods lead to poor variational approximations if the problem is moderately large.
Motivated by these findings, we develop a more robust and accurate optimization framework by viewing the underlying algorithm as producing a Markov chain.
arXiv Detail & Related papers (2020-09-01T19:12:11Z) - Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis [102.29671176698373]
We address the problem of policy evaluation in discounted decision processes, and provide Markov-dependent guarantees on the $ell_infty$error under a generative model.
We establish both and non-asymptotic versions of local minimax lower bounds for policy evaluation, thereby providing an instance-dependent baseline by which to compare algorithms.
arXiv Detail & Related papers (2020-03-16T17:15:28Z) - Support recovery and sup-norm convergence rates for sparse pivotal
estimation [79.13844065776928]
In high dimensional sparse regression, pivotal estimators are estimators for which the optimal regularization parameter is independent of the noise level.
We show minimax sup-norm convergence rates for non smoothed and smoothed, single task and multitask square-root Lasso-type estimators.
arXiv Detail & Related papers (2020-01-15T16:11:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.