Off-Policy Fitted Q-Evaluation with Differentiable Function
Approximators: Z-Estimation and Inference Theory
- URL: http://arxiv.org/abs/2202.04970v1
- Date: Thu, 10 Feb 2022 11:59:54 GMT
- Title: Off-Policy Fitted Q-Evaluation with Differentiable Function
Approximators: Z-Estimation and Inference Theory
- Authors: Ruiqi Zhang, Xuezhou Zhang, Chengzhuo Ni, and Mengdi Wang
- Abstract summary: Off-Policy Evaluation serves as one of the cornerstones in Reinforcement Learning (RL)
We focus on FQE with general differentiable function approximators, making our theory applicable to neural function approximations.
The finite-sample FQE error bound is dominated by the same variance term, and it can also be bounded by function class-dependent divergence.
- Score: 34.307187875861516
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Off-Policy Evaluation (OPE) serves as one of the cornerstones in
Reinforcement Learning (RL). Fitted Q Evaluation (FQE) with various function
approximators, especially deep neural networks, has gained practical success.
While statistical analysis has proved FQE to be minimax-optimal with tabular,
linear and several nonparametric function families, its practical performance
with more general function approximator is less theoretically understood. We
focus on FQE with general differentiable function approximators, making our
theory applicable to neural function approximations. We approach this problem
using the Z-estimation theory and establish the following results: The FQE
estimation error is asymptotically normal with explicit variance determined
jointly by the tangent space of the function class at the ground truth, the
reward structure, and the distribution shift due to off-policy learning; The
finite-sample FQE error bound is dominated by the same variance term, and it
can also be bounded by function class-dependent divergence, which measures how
the off-policy distribution shift intertwines with the function approximator.
In addition, we study bootstrapping FQE estimators for error distribution
inference and estimating confidence intervals, accompanied by a Cramer-Rao
lower bound that matches our upper bounds. The Z-estimation analysis provides a
generalizable theoretical framework for studying off-policy estimation in RL
and provides sharp statistical theory for FQE with differentiable function
approximators.
Related papers
- Statistical Inference for Temporal Difference Learning with Linear Function Approximation [62.69448336714418]
Temporal Difference (TD) learning, arguably the most widely used for policy evaluation, serves as a natural framework for this purpose.
In this paper, we study the consistency properties of TD learning with Polyak-Ruppert averaging and linear function approximation, and obtain three significant improvements over existing results.
arXiv Detail & Related papers (2024-10-21T15:34:44Z) - Convergence of Continuous Normalizing Flows for Learning Probability Distributions [10.381321024264484]
Continuous normalizing flows (CNFs) are a generative method for learning probability distributions.
We study the theoretical properties of CNFs with linear regularity in learning probability distributions from a finite random sample.
We present a convergence analysis framework that encompasses the error due to velocity estimation, the discretization error, and the early stopping error.
arXiv Detail & Related papers (2024-03-31T03:39:04Z) - Statistical Inference of Optimal Allocations I: Regularities and their Implications [3.904240476752459]
We first derive Hadamard differentiability of the value function through a detailed analysis of the general properties of the sorting operator.
Building on our Hadamard differentiability results, we demonstrate how the functional delta method can be used to directly derive the properties of the value function process.
arXiv Detail & Related papers (2024-03-27T04:39:13Z) - Structured Radial Basis Function Network: Modelling Diversity for
Multiple Hypotheses Prediction [51.82628081279621]
Multi-modal regression is important in forecasting nonstationary processes or with a complex mixture of distributions.
A Structured Radial Basis Function Network is presented as an ensemble of multiple hypotheses predictors for regression problems.
It is proved that this structured model can efficiently interpolate this tessellation and approximate the multiple hypotheses target distribution.
arXiv Detail & Related papers (2023-09-02T01:27:53Z) - Kernel-based off-policy estimation without overlap: Instance optimality
beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data.
For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z) - Data-Driven Influence Functions for Optimization-Based Causal Inference [105.5385525290466]
We study a constructive algorithm that approximates Gateaux derivatives for statistical functionals by finite differencing.
We study the case where probability distributions are not known a priori but need to be estimated from data.
arXiv Detail & Related papers (2022-08-29T16:16:22Z) - Efficient CDF Approximations for Normalizing Flows [64.60846767084877]
We build upon the diffeomorphic properties of normalizing flows to estimate the cumulative distribution function (CDF) over a closed region.
Our experiments on popular flow architectures and UCI datasets show a marked improvement in sample efficiency as compared to traditional estimators.
arXiv Detail & Related papers (2022-02-23T06:11:49Z) - Optimal variance-reduced stochastic approximation in Banach spaces [114.8734960258221]
We study the problem of estimating the fixed point of a contractive operator defined on a separable Banach space.
We establish non-asymptotic bounds for both the operator defect and the estimation error.
arXiv Detail & Related papers (2022-01-21T02:46:57Z) - Neural Estimation of Statistical Divergences [24.78742908726579]
A modern method for estimating statistical divergences relies on parametrizing an empirical variational form by a neural network (NN)
In particular, there is a fundamental tradeoff between the two sources of error involved: approximation and empirical estimation.
We show that neural estimators with a slightly different NN growth-rate are near minimax rate-optimal, achieving the parametric convergence rate up to logarithmic factors.
arXiv Detail & Related papers (2021-10-07T17:42:44Z) - Non-Asymptotic Performance Guarantees for Neural Estimation of
$\mathsf{f}$-Divergences [22.496696555768846]
Statistical distances quantify the dissimilarity between probability distributions.
A modern method for estimating such distances from data relies on parametrizing a variational form by a neural network (NN) and optimizing it.
This paper explores this tradeoff by means of non-asymptotic error bounds, focusing on three popular choices of SDs.
arXiv Detail & Related papers (2021-03-11T19:47:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.