A Unifying View of Optimism in Episodic Reinforcement Learning
- URL: http://arxiv.org/abs/2007.01891v1
- Date: Fri, 3 Jul 2020 18:10:30 GMT
- Title: A Unifying View of Optimism in Episodic Reinforcement Learning
- Authors: Gergely Neu and Ciara Pike-Burke
- Abstract summary: This paper provides a framework for designing, analyzing and implementing optimistic reinforcement learning algorithms.
Every model-optimistic algorithm that constructs an optimistic MDP has an equivalent representation as a value-optimistic dynamic programming algorithm.
- Score: 18.73198634652064
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The principle of optimism in the face of uncertainty underpins many
theoretically successful reinforcement learning algorithms. In this paper we
provide a general framework for designing, analyzing and implementing such
algorithms in the episodic reinforcement learning problem. This framework is
built upon Lagrangian duality, and demonstrates that every model-optimistic
algorithm that constructs an optimistic MDP has an equivalent representation as
a value-optimistic dynamic programming algorithm. Typically, it was thought
that these two classes of algorithms were distinct, with model-optimistic
algorithms benefiting from a cleaner probabilistic analysis while
value-optimistic algorithms are easier to implement and thus more practical.
With the framework developed in this paper, we show that it is possible to get
the best of both worlds by providing a class of algorithms which have a
computationally efficient dynamic-programming implementation and also a simple
probabilistic analysis. Besides being able to capture many existing algorithms
in the tabular setting, our framework can also address largescale problems
under realizable function approximation, where it enables a simple model-based
analysis of some recently proposed methods.
Related papers
- Explainable Benchmarking for Iterative Optimization Heuristics [0.8192907805418583]
We introduce the IOH-Xplainer software framework, for analyzing and understanding the performance of various optimization algorithms.
We examine the impact of different algorithmic components and configurations, offering insights into their performance across diverse scenarios.
arXiv Detail & Related papers (2024-01-31T14:02:26Z) - Efficient Model-Free Exploration in Low-Rank MDPs [76.87340323826945]
Low-Rank Markov Decision Processes offer a simple, yet expressive framework for RL with function approximation.
Existing algorithms are either (1) computationally intractable, or (2) reliant upon restrictive statistical assumptions.
We propose the first provably sample-efficient algorithm for exploration in Low-Rank MDPs.
arXiv Detail & Related papers (2023-07-08T15:41:48Z) - Representation Learning with Multi-Step Inverse Kinematics: An Efficient
and Optimal Approach to Rich-Observation RL [106.82295532402335]
Existing reinforcement learning algorithms suffer from computational intractability, strong statistical assumptions, and suboptimal sample complexity.
We provide the first computationally efficient algorithm that attains rate-optimal sample complexity with respect to the desired accuracy level.
Our algorithm, MusIK, combines systematic exploration with representation learning based on multi-step inverse kinematics.
arXiv Detail & Related papers (2023-04-12T14:51:47Z) - Dual Algorithmic Reasoning [9.701208207491879]
We propose to learn algorithms by exploiting duality of the underlying algorithmic problem.
We demonstrate that simultaneously learning the dual definition of these optimisation problems in algorithmic learning allows for better learning.
We then validate the real-world utility of our dual algorithmic reasoner by deploying it on a challenging brain vessel classification task.
arXiv Detail & Related papers (2023-02-09T08:46:23Z) - B\'ezier Flow: a Surface-wise Gradient Descent Method for
Multi-objective Optimization [12.487037582320804]
We extend the stability of optimization algorithms in the sense of Probability Approximately Correct (PAC) learning.
We show that multi-objective optimization algorithms derived from a gradient descent-based single-objective optimization algorithm are PAC stable.
arXiv Detail & Related papers (2022-05-23T07:47:58Z) - Neural Combinatorial Optimization: a New Player in the Field [69.23334811890919]
This paper presents a critical analysis on the incorporation of algorithms based on neural networks into the classical optimization framework.
A comprehensive study is carried out to analyse the fundamental aspects of such algorithms, including performance, transferability, computational cost and to larger-sized instances.
arXiv Detail & Related papers (2022-05-03T07:54:56Z) - Amortized Implicit Differentiation for Stochastic Bilevel Optimization [53.12363770169761]
We study a class of algorithms for solving bilevel optimization problems in both deterministic and deterministic settings.
We exploit a warm-start strategy to amortize the estimation of the exact gradient.
By using this framework, our analysis shows these algorithms to match the computational complexity of methods that have access to an unbiased estimate of the gradient.
arXiv Detail & Related papers (2021-11-29T15:10:09Z) - ES-Based Jacobian Enables Faster Bilevel Optimization [53.675623215542515]
Bilevel optimization (BO) has arisen as a powerful tool for solving many modern machine learning problems.
Existing gradient-based methods require second-order derivative approximations via Jacobian- or/and Hessian-vector computations.
We propose a novel BO algorithm, which adopts Evolution Strategies (ES) based method to approximate the response Jacobian matrix in the hypergradient of BO.
arXiv Detail & Related papers (2021-10-13T19:36:50Z) - A Dynamical Systems Approach for Convergence of the Bayesian EM
Algorithm [59.99439951055238]
We show how (discrete-time) Lyapunov stability theory can serve as a powerful tool to aid, or even lead, in the analysis (and potential design) of optimization algorithms that are not necessarily gradient-based.
The particular ML problem that this paper focuses on is that of parameter estimation in an incomplete-data Bayesian framework via the popular optimization algorithm known as maximum a posteriori expectation-maximization (MAP-EM)
We show that fast convergence (linear or quadratic) is achieved, which could have been difficult to unveil without our adopted S&C approach.
arXiv Detail & Related papers (2020-06-23T01:34:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.