Q* Approximation Schemes for Batch Reinforcement Learning: A Theoretical
Comparison
- URL: http://arxiv.org/abs/2003.03924v4
- Date: Mon, 24 Aug 2020 04:09:22 GMT
- Title: Q* Approximation Schemes for Batch Reinforcement Learning: A Theoretical
Comparison
- Authors: Tengyang Xie, Nan Jiang
- Abstract summary: We prove performance guarantees of two algorithms for approxing $Qstar$ in batch reinforcement learning.
One of the algorithms uses a novel and explicit importance-weighting correction to overcome the infamous "double sampling" difficulty in Bellman error estimation.
- Score: 17.692408242465763
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We prove performance guarantees of two algorithms for approximating $Q^\star$
in batch reinforcement learning. Compared to classical iterative methods such
as Fitted Q-Iteration---whose performance loss incurs quadratic dependence on
horizon---these methods estimate (some forms of) the Bellman error and enjoy
linear-in-horizon error propagation, a property established for the first time
for algorithms that rely solely on batch data and output stationary policies.
One of the algorithms uses a novel and explicit importance-weighting correction
to overcome the infamous "double sampling" difficulty in Bellman error
estimation, and does not use any squared losses. Our analyses reveal its
distinct characteristics and potential advantages compared to classical
algorithms.
Related papers
- Bregman-divergence-based Arimoto-Blahut algorithm [53.64687146666141]
We generalize the Arimoto-Blahut algorithm to a general function defined over Bregman-divergence system.
We propose a convex-optimization-free algorithm that can be applied to classical and quantum rate-distortion theory.
arXiv Detail & Related papers (2024-08-10T06:16:24Z) - Regularized Q-Learning with Linear Function Approximation [2.765106384328772]
We consider a bi-level optimization formulation of regularized Q-learning with linear functional approximation.
We show that, under certain assumptions, the proposed algorithm converges to a stationary point in the presence of Markovian noise.
arXiv Detail & Related papers (2024-01-26T20:45:40Z) - Parameterized Projected Bellman Operator [64.129598593852]
Approximate value iteration (AVI) is a family of algorithms for reinforcement learning (RL)
We propose a novel alternative approach based on learning an approximate version of the Bellman operator.
We formulate an optimization problem to learn PBO for generic sequential decision-making problems.
arXiv Detail & Related papers (2023-12-20T09:33:16Z) - A Log-linear Gradient Descent Algorithm for Unbalanced Binary
Classification using the All Pairs Squared Hinge Loss [0.0]
We propose a new functional representation of the square loss and squared hinge loss, which results in algorithms that compute the gradient in either linear or log-linear time.
Our new algorithm can achieve higher test AUC values on imbalanced data sets than previous algorithms, and make use of larger batch sizes than were previously feasible.
arXiv Detail & Related papers (2023-02-21T23:35:00Z) - Dictionary and prior learning with unrolled algorithms for unsupervised
inverse problems [12.54744464424354]
We study Dictionary and Prior learning from degraded measurements as a bi-level problem.
We take advantage of unrolled algorithms to solve approximate formulations of Synthesis and Analysis.
arXiv Detail & Related papers (2021-06-11T12:21:26Z) - Logistic Q-Learning [87.00813469969167]
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs.
The main feature of our algorithm is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error.
arXiv Detail & Related papers (2020-10-21T17:14:31Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z) - Accelerated Message Passing for Entropy-Regularized MAP Inference [89.15658822319928]
Maximum a posteriori (MAP) inference in discrete-valued random fields is a fundamental problem in machine learning.
Due to the difficulty of this problem, linear programming (LP) relaxations are commonly used to derive specialized message passing algorithms.
We present randomized methods for accelerating these algorithms by leveraging techniques that underlie classical accelerated gradient.
arXiv Detail & Related papers (2020-07-01T18:43:32Z) - Learning and Planning in Average-Reward Markov Decision Processes [15.586087060535398]
We introduce learning and planning algorithms for average-reward MDPs.
All of our algorithms are based on using the temporal-difference error rather than the conventional error when updating the estimate of the average reward.
arXiv Detail & Related papers (2020-06-29T19:03:24Z) - SONIA: A Symmetric Blockwise Truncated Optimization Algorithm [2.9923891863939938]
This work presents a new algorithm for empirical risk.
The algorithm bridges the gap between first- and second-order search methods by computing a second-order search-type update in one subspace, coupled with a scaled steepest descent step in the Theoretical complement.
arXiv Detail & Related papers (2020-06-06T19:28:14Z) - Meta-learning with Stochastic Linear Bandits [120.43000970418939]
We consider a class of bandit algorithms that implement a regularized version of the well-known OFUL algorithm, where the regularization is a square euclidean distance to a bias vector.
We show both theoretically and experimentally, that when the number of tasks grows and the variance of the task-distribution is small, our strategies have a significant advantage over learning the tasks in isolation.
arXiv Detail & Related papers (2020-05-18T08:41:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.