Finite Sample Analysis of Minimax Offline Reinforcement Learning:
Completeness, Fast Rates and First-Order Efficiency
- URL: http://arxiv.org/abs/2102.02981v1
- Date: Fri, 5 Feb 2021 03:20:39 GMT
- Title: Finite Sample Analysis of Minimax Offline Reinforcement Learning:
Completeness, Fast Rates and First-Order Efficiency
- Authors: Masatoshi Uehara, Masaaki Imaizumi, Nan Jiang, Nathan Kallus, Wen Sun,
Tengyang Xie
- Abstract summary: We offer a theoretical characterization of off-policy evaluation (OPE) in reinforcement learning.
We show that the minimax approach enables us to achieve a fast rate of convergence for weights and quality functions.
We present the first finite-sample result with first-order efficiency in non-tabular environments.
- Score: 83.02999769628593
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We offer a theoretical characterization of off-policy evaluation (OPE) in
reinforcement learning using function approximation for marginal importance
weights and $q$-functions when these are estimated using recent minimax
methods. Under various combinations of realizability and completeness
assumptions, we show that the minimax approach enables us to achieve a fast
rate of convergence for weights and quality functions, characterized by the
critical inequality \citep{bartlett2005}. Based on this result, we analyze
convergence rates for OPE. In particular, we introduce novel alternative
completeness conditions under which OPE is feasible and we present the first
finite-sample result with first-order efficiency in non-tabular environments,
i.e., having the minimal coefficient in the leading term.
Related papers
- Statistical Inference for Temporal Difference Learning with Linear Function Approximation [62.69448336714418]
Temporal Difference (TD) learning, arguably the most widely used for policy evaluation, serves as a natural framework for this purpose.
In this paper, we study the consistency properties of TD learning with Polyak-Ruppert averaging and linear function approximation, and obtain three significant improvements over existing results.
arXiv Detail & Related papers (2024-10-21T15:34:44Z) - Tighter Performance Theory of FedExProx [85.92481138826949]
We revisit FedExProx - a recently proposed distributed optimization method designed to enhance convergence properties of parallel algorithms via extrapolation.
We develop a novel analysis framework, establishing a tighter linear convergence rate for non-strongly convex quadratic problems.
We extend the applicability of our analysis to general functions satisfying the Polyak-Lojasiewicz condition, outperforming the previous strongly convex analysis.
arXiv Detail & Related papers (2024-10-20T11:53:25Z) - Nonparametric Instrumental Variable Regression through Stochastic Approximate Gradients [0.3277163122167434]
We show how to formulate a functional gradient descent algorithm to tackle NPIV regression by directly minimizing the populational risk.
We provide theoretical support in the form of bounds on the excess risk, and conduct numerical experiments showcasing our method's superior stability and competitive performance.
This algorithm enables flexible estimator choices, such as neural networks or kernel based methods, as well as non-quadratic loss functions.
arXiv Detail & Related papers (2024-02-08T12:50:38Z) - Primal and Dual Analysis of Entropic Fictitious Play for Finite-sum
Problems [42.375903320536715]
The entropic fictitious play (EFP) is a recently proposed algorithm that minimizes the sum of a convex functional and entropy in the space of measures.
We provide a concise primal-dual analysis of EFP in the setting where the learning problem exhibits a finite-sum structure.
arXiv Detail & Related papers (2023-03-06T08:05:08Z) - Pessimistic Minimax Value Iteration: Provably Efficient Equilibrium
Learning from Offline Datasets [101.5329678997916]
We study episodic two-player zero-sum Markov games (MGs) in the offline setting.
The goal is to find an approximate Nash equilibrium (NE) policy pair based on a dataset collected a priori.
arXiv Detail & Related papers (2022-02-15T15:39:30Z) - SGD for Structured Nonconvex Functions: Learning Rates, Minibatching and
Interpolation [17.199023009789308]
The Expected assumption of SGD (SGD) is being used routinely for non-artisan functions.
In this paper, we show a paradigms for convergence to a smooth non-linear setting.
We also provide theoretical guarantees for different step-size conditions.
arXiv Detail & Related papers (2020-06-18T07:05:56Z) - A maximum-entropy approach to off-policy evaluation in average-reward
MDPs [54.967872716145656]
This work focuses on off-policy evaluation (OPE) with function approximation in infinite-horizon undiscounted Markov decision processes (MDPs)
We provide the first finite-sample OPE error bound, extending existing results beyond the episodic and discounted cases.
We show that this results in an exponential-family distribution whose sufficient statistics are the features, paralleling maximum-entropy approaches in supervised learning.
arXiv Detail & Related papers (2020-06-17T18:13:37Z) - Fast Objective & Duality Gap Convergence for Non-Convex Strongly-Concave
Min-Max Problems with PL Condition [52.08417569774822]
This paper focuses on methods for solving smooth non-concave min-max problems, which have received increasing attention due to deep learning (e.g., deep AUC)
arXiv Detail & Related papers (2020-06-12T00:32:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.