Adaptive Discretization in Online Reinforcement Learning
- URL: http://arxiv.org/abs/2110.15843v1
- Date: Fri, 29 Oct 2021 15:06:15 GMT
- Title: Adaptive Discretization in Online Reinforcement Learning
- Authors: Sean R. Sinclair, Siddhartha Banerjee, Christina Lee Yu
- Abstract summary: Two major questions in designing discretization-based algorithms are how to create the discretization and when to refine it.
We provide a unified theoretical analysis of tree-based hierarchical partitioning methods for online reinforcement learning.
Our algorithms are easily adapted to operating constraints, and our theory provides explicit bounds across each of the three facets.
- Score: 9.560980936110234
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Discretization based approaches to solving online reinforcement learning
problems have been studied extensively in practice on applications ranging from
resource allocation to cache management. Two major questions in designing
discretization-based algorithms are how to create the discretization and when
to refine it. While there have been several experimental results investigating
heuristic solutions to these questions, there has been little theoretical
treatment. In this paper we provide a unified theoretical analysis of
tree-based hierarchical partitioning methods for online reinforcement learning,
providing model-free and model-based algorithms. We show how our algorithms are
able to take advantage of inherent structure of the problem by providing
guarantees that scale with respect to the 'zooming dimension' instead of the
ambient dimension, an instance-dependent quantity measuring the benignness of
the optimal $Q_h^\star$ function.
Many applications in computing systems and operations research requires
algorithms that compete on three facets: low sample complexity, mild storage
requirements, and low computational burden. Our algorithms are easily adapted
to operating constraints, and our theory provides explicit bounds across each
of the three facets. This motivates its use in practical applications as our
approach automatically adapts to underlying problem structure even when very
little is known a priori about the system.
Related papers
- Limits and Powers of Koopman Learning [0.0]
Dynamical systems provide a comprehensive way to study complex and changing behaviors across various sciences.
Koopman operators have emerged as a dominant approach because they allow the study of nonlinear dynamics using linear techniques.
This paper addresses a fundamental open question: textitWhen can we robustly learn the spectral properties of Koopman operators from trajectory data of dynamical systems, and when can we not?
arXiv Detail & Related papers (2024-07-08T18:24:48Z) - Multi-Timescale Ensemble Q-learning for Markov Decision Process Policy
Optimization [21.30645601474163]
Original Q-learning suffers from performance and complexity challenges across very large networks.
New model-free ensemble reinforcement learning algorithm which adapts the classical Q-learning is proposed to handle these challenges.
Numerical results show that the proposed algorithm can achieve up to 55% less average policy error with up to 50% less runtime complexity.
arXiv Detail & Related papers (2024-02-08T08:08:23Z) - Towards model-free RL algorithms that scale well with unstructured data [1.3799571823220778]
We provide an algorithm that constructs reward-relevant general value function questions to find and exploit predictive structure directly from the experience stream.
The proposed algorithm reliably outperforms a conventional deep RL algorithm on these scaling problems.
arXiv Detail & Related papers (2023-11-03T20:03:54Z) - Representation Learning with Multi-Step Inverse Kinematics: An Efficient
and Optimal Approach to Rich-Observation RL [106.82295532402335]
Existing reinforcement learning algorithms suffer from computational intractability, strong statistical assumptions, and suboptimal sample complexity.
We provide the first computationally efficient algorithm that attains rate-optimal sample complexity with respect to the desired accuracy level.
Our algorithm, MusIK, combines systematic exploration with representation learning based on multi-step inverse kinematics.
arXiv Detail & Related papers (2023-04-12T14:51:47Z) - Stabilizing Q-learning with Linear Architectures for Provably Efficient
Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation.
We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z) - Exploring Viable Algorithmic Options for Learning from Demonstration
(LfD): A Parameterized Complexity Approach [0.0]
In this paper, we show how such a systematic exploration of algorithmic options can be done using parameterized complexity analysis.
We show that none of our problems can be solved efficiently either in general or relative to a number of (often simultaneous) restrictions on environments, demonstrations, and policies.
arXiv Detail & Related papers (2022-05-10T15:54:06Z) - Neural Combinatorial Optimization: a New Player in the Field [69.23334811890919]
This paper presents a critical analysis on the incorporation of algorithms based on neural networks into the classical optimization framework.
A comprehensive study is carried out to analyse the fundamental aspects of such algorithms, including performance, transferability, computational cost and to larger-sized instances.
arXiv Detail & Related papers (2022-05-03T07:54:56Z) - Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems.
Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC.
We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z) - Learning to Actively Learn: A Robust Approach [22.75298609290053]
This work proposes a procedure for designing algorithms for adaptive data collection tasks like active learning and pure-exploration multi-armed bandits.
Our adaptive algorithm is learned via adversarial training over equivalence classes of problems derived from information theoretic lower bounds.
We perform synthetic experiments to justify the stability and effectiveness of the training procedure, and then evaluate the method on tasks derived from real data.
arXiv Detail & Related papers (2020-10-29T06:48:22Z) - An Integer Linear Programming Framework for Mining Constraints from Data [81.60135973848125]
We present a general framework for mining constraints from data.
In particular, we consider the inference in structured output prediction as an integer linear programming (ILP) problem.
We show that our approach can learn to solve 9x9 Sudoku puzzles and minimal spanning tree problems from examples without providing the underlying rules.
arXiv Detail & Related papers (2020-06-18T20:09:53Z) - Beyond Worst-Case Analysis in Stochastic Approximation: Moment
Estimation Improves Instance Complexity [58.70807593332932]
We study oracle complexity of gradient based methods for approximation problems.
We focus on instance-dependent complexity instead of worst case complexity.
Our proposed algorithm and its analysis provide a theoretical justification for the success of moment estimation.
arXiv Detail & Related papers (2020-06-08T09:25:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.