Stabilizing Q-learning with Linear Architectures for Provably Efficient
Learning
- URL: http://arxiv.org/abs/2206.00796v1
- Date: Wed, 1 Jun 2022 23:26:51 GMT
- Title: Stabilizing Q-learning with Linear Architectures for Provably Efficient
Learning
- Authors: Andrea Zanette, Martin J. Wainwright
- Abstract summary: This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation.
We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
- Score: 53.17258888552998
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The $Q$-learning algorithm is a simple and widely-used stochastic
approximation scheme for reinforcement learning, but the basic protocol can
exhibit instability in conjunction with function approximation. Such
instability can be observed even with linear function approximation. In
practice, tools such as target networks and experience replay appear to be
essential, but the individual contribution of each of these mechanisms is not
well understood theoretically. This work proposes an exploration variant of the
basic $Q$-learning protocol with linear function approximation. Our modular
analysis illustrates the role played by each algorithmic tool that we adopt: a
second order update rule, a set of target networks, and a mechanism akin to
experience replay. Together, they enable state of the art regret bounds on
linear MDPs while preserving the most prominent feature of the algorithm,
namely a space complexity independent of the number of step elapsed. We show
that the performance of the algorithm degrades very gracefully under a novel
and more permissive notion of approximation error. The algorithm also exhibits
a form of instance-dependence, in that its performance depends on the
"effective" feature dimension.
Related papers
- RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Efficient Model-Free Exploration in Low-Rank MDPs [76.87340323826945]
Low-Rank Markov Decision Processes offer a simple, yet expressive framework for RL with function approximation.
Existing algorithms are either (1) computationally intractable, or (2) reliant upon restrictive statistical assumptions.
We propose the first provably sample-efficient algorithm for exploration in Low-Rank MDPs.
arXiv Detail & Related papers (2023-07-08T15:41:48Z) - Representation Learning with Multi-Step Inverse Kinematics: An Efficient
and Optimal Approach to Rich-Observation RL [106.82295532402335]
Existing reinforcement learning algorithms suffer from computational intractability, strong statistical assumptions, and suboptimal sample complexity.
We provide the first computationally efficient algorithm that attains rate-optimal sample complexity with respect to the desired accuracy level.
Our algorithm, MusIK, combines systematic exploration with representation learning based on multi-step inverse kinematics.
arXiv Detail & Related papers (2023-04-12T14:51:47Z) - Offline Reinforcement Learning with Differentiable Function
Approximation is Provably Efficient [65.08966446962845]
offline reinforcement learning, which aims at optimizing decision-making strategies with historical data, has been extensively applied in real-life applications.
We take a step by considering offline reinforcement learning with differentiable function class approximation (DFA)
Most importantly, we show offline differentiable function approximation is provably efficient by analyzing the pessimistic fitted Q-learning algorithm.
arXiv Detail & Related papers (2022-10-03T07:59:42Z) - Annealing Optimization for Progressive Learning with Stochastic
Approximation [0.0]
We introduce a learning model designed to meet the needs of applications in which computational resources are limited.
We develop an online prototype-based learning algorithm that is formulated as an online-free gradient approximation algorithm.
The learning model can be viewed as an interpretable and progressively growing competitive neural network model to be used for supervised, unsupervised, and reinforcement learning.
arXiv Detail & Related papers (2022-09-06T21:31:01Z) - Learning for Spatial Branching: An Algorithm Selection Approach [0.0]
We develop a learning framework for branching and show its efficacy in the context of non-linear optimization problems.
The proposed learning is performed offline, based on instance-specific features and with no computational overhead when solving new instances.
Experiments on different benchmark instances from the show literature that the learning-based branching rule significantly outperforms the standard rules.
arXiv Detail & Related papers (2022-04-22T17:23:43Z) - Simple Stochastic and Online Gradient DescentAlgorithms for Pairwise
Learning [65.54757265434465]
Pairwise learning refers to learning tasks where the loss function depends on a pair instances.
Online descent (OGD) is a popular approach to handle streaming data in pairwise learning.
In this paper, we propose simple and online descent to methods for pairwise learning.
arXiv Detail & Related papers (2021-11-23T18:10:48Z) - Online Deterministic Annealing for Classification and Clustering [0.0]
We introduce an online prototype-based learning algorithm for clustering and classification.
We show that the proposed algorithm constitutes a competitive-learning neural network, the learning rule of which is formulated as an online approximation algorithm.
arXiv Detail & Related papers (2021-02-11T04:04:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.