Related papers: A Unifying View of Optimism in Episodic Reinforcement Learning

A Unifying View of Optimism in Episodic Reinforcement Learning

URL: http://arxiv.org/abs/2007.01891v1
Date: Fri, 3 Jul 2020 18:10:30 GMT
Title: A Unifying View of Optimism in Episodic Reinforcement Learning
Authors: Gergely Neu and Ciara Pike-Burke
Abstract summary: This paper provides a framework for designing, analyzing and implementing optimistic reinforcement learning algorithms. Every model-optimistic algorithm that constructs an optimistic MDP has an equivalent representation as a value-optimistic dynamic programming algorithm.
Score: 18.73198634652064
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The principle of optimism in the face of uncertainty underpins many theoretically successful reinforcement learning algorithms. In this paper we provide a general framework for designing, analyzing and implementing such algorithms in the episodic reinforcement learning problem. This framework is built upon Lagrangian duality, and demonstrates that every model-optimistic algorithm that constructs an optimistic MDP has an equivalent representation as a value-optimistic dynamic programming algorithm. Typically, it was thought that these two classes of algorithms were distinct, with model-optimistic algorithms benefiting from a cleaner probabilistic analysis while value-optimistic algorithms are easier to implement and thus more practical. With the framework developed in this paper, we show that it is possible to get the best of both worlds by providing a class of algorithms which have a computationally efficient dynamic-programming implementation and also a simple probabilistic analysis. Besides being able to capture many existing algorithms in the tabular setting, our framework can also address largescale problems under realizable function approximation, where it enables a simple model-based analysis of some recently proposed methods.

Related papers

A Robust Algorithm for Non-IID Machine Learning Problems with Convergence Analysis [2.4462606119036456]
We propose an improved numerical algorithm for solving minimax problems based on nonsmooth optimization, quadratic programming and iterative process.<n>Such an algorithm can be widely applied in various fields such as robust optimization, imbalanced learning, etc.
arXiv Detail & Related papers (2025-07-01T14:41:59Z)
Explainable Benchmarking for Iterative Optimization Heuristics [0.8192907805418583]
We introduce the IOH-Xplainer software framework, for analyzing and understanding the performance of various optimization algorithms. We examine the impact of different algorithmic components and configurations, offering insights into their performance across diverse scenarios.
arXiv Detail & Related papers (2024-01-31T14:02:26Z)
Efficient Model-Free Exploration in Low-Rank MDPs [76.87340323826945]
Low-Rank Markov Decision Processes offer a simple, yet expressive framework for RL with function approximation. Existing algorithms are either (1) computationally intractable, or (2) reliant upon restrictive statistical assumptions. We propose the first provably sample-efficient algorithm for exploration in Low-Rank MDPs.
arXiv Detail & Related papers (2023-07-08T15:41:48Z)
Representation Learning with Multi-Step Inverse Kinematics: An Efficient and Optimal Approach to Rich-Observation RL [106.82295532402335]
Existing reinforcement learning algorithms suffer from computational intractability, strong statistical assumptions, and suboptimal sample complexity. We provide the first computationally efficient algorithm that attains rate-optimal sample complexity with respect to the desired accuracy level. Our algorithm, MusIK, combines systematic exploration with representation learning based on multi-step inverse kinematics.
arXiv Detail & Related papers (2023-04-12T14:51:47Z)
Dual Algorithmic Reasoning [9.701208207491879]
We propose to learn algorithms by exploiting duality of the underlying algorithmic problem. We demonstrate that simultaneously learning the dual definition of these optimisation problems in algorithmic learning allows for better learning. We then validate the real-world utility of our dual algorithmic reasoner by deploying it on a challenging brain vessel classification task.
arXiv Detail & Related papers (2023-02-09T08:46:23Z)
B\'ezier Flow: a Surface-wise Gradient Descent Method for Multi-objective Optimization [12.487037582320804]
We extend the stability of optimization algorithms in the sense of Probability Approximately Correct (PAC) learning. We show that multi-objective optimization algorithms derived from a gradient descent-based single-objective optimization algorithm are PAC stable.
arXiv Detail & Related papers (2022-05-23T07:47:58Z)
Neural Combinatorial Optimization: a New Player in the Field [69.23334811890919]
This paper presents a critical analysis on the incorporation of algorithms based on neural networks into the classical optimization framework. A comprehensive study is carried out to analyse the fundamental aspects of such algorithms, including performance, transferability, computational cost and to larger-sized instances.
arXiv Detail & Related papers (2022-05-03T07:54:56Z)
Amortized Implicit Differentiation for Stochastic Bilevel Optimization [53.12363770169761]
We study a class of algorithms for solving bilevel optimization problems in both deterministic and deterministic settings. We exploit a warm-start strategy to amortize the estimation of the exact gradient. By using this framework, our analysis shows these algorithms to match the computational complexity of methods that have access to an unbiased estimate of the gradient.
arXiv Detail & Related papers (2021-11-29T15:10:09Z)
ES-Based Jacobian Enables Faster Bilevel Optimization [53.675623215542515]
Bilevel optimization (BO) has arisen as a powerful tool for solving many modern machine learning problems. Existing gradient-based methods require second-order derivative approximations via Jacobian- or/and Hessian-vector computations. We propose a novel BO algorithm, which adopts Evolution Strategies (ES) based method to approximate the response Jacobian matrix in the hypergradient of BO.
arXiv Detail & Related papers (2021-10-13T19:36:50Z)
A Dynamical Systems Approach for Convergence of the Bayesian EM Algorithm [59.99439951055238]
We show how (discrete-time) Lyapunov stability theory can serve as a powerful tool to aid, or even lead, in the analysis (and potential design) of optimization algorithms that are not necessarily gradient-based. The particular ML problem that this paper focuses on is that of parameter estimation in an incomplete-data Bayesian framework via the popular optimization algorithm known as maximum a posteriori expectation-maximization (MAP-EM) We show that fast convergence (linear or quadratic) is achieved, which could have been difficult to unveil without our adopted S&C approach.
arXiv Detail & Related papers (2020-06-23T01:34:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.