Refined Analysis of FPL for Adversarial Markov Decision Processes
- URL: http://arxiv.org/abs/2008.09251v1
- Date: Fri, 21 Aug 2020 01:12:10 GMT
- Title: Refined Analysis of FPL for Adversarial Markov Decision Processes
- Authors: Yuanhao Wang and Kefan Dong
- Abstract summary: Follow-the-PerturbedLeader (FPL) based algorithms have been proposed in previous literature.
We improve the analysis of FPL based algorithms in both settings, matching the current best regret bounds using faster and simpler algorithms.
- Score: 9.188318506016897
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the adversarial Markov Decision Process (MDP) problem, where the
rewards for the MDP can be adversarially chosen, and the transition function
can be either known or unknown. In both settings, Follow-the-PerturbedLeader
(FPL) based algorithms have been proposed in previous literature. However, the
established regret bounds for FPL based algorithms are worse than algorithms
based on mirrordescent. We improve the analysis of FPL based algorithms in both
settings, matching the current best regret bounds using faster and simpler
algorithms.
Related papers
- Optimism in the Face of Ambiguity Principle for Multi-Armed Bandits [6.7310264583128445]
Follow-The-Regularized-Leader (FTRL) algorithms often enjoy optimal regret for adversarial as well as bandit problems.
We propose a new FTPL algorithm that generates optimal policies for both adversarial and multi-armed bandits.
arXiv Detail & Related papers (2024-09-30T16:00:23Z) - Warm-up Free Policy Optimization: Improved Regret in Linear Markov Decision Processes [12.76843681997386]
Policy Optimization (PO) methods are among the most popular Reinforcement Learning (RL) algorithms in practice.
This paper proposes a PO-based algorithm with rate-optimal regret guarantees under the linear Markov Decision Process (MDP) model.
Our algorithm achieves regret with improved dependence on the other parameters of the problem.
arXiv Detail & Related papers (2024-07-03T12:36:24Z) - Follow-the-Perturbed-Leader for Adversarial Markov Decision Processes
with Bandit Feedback [35.687473978249535]
We consider regret for Adversarial Markov Decision Processes (AMDPs) where the loss functions are changing over time and adversarially chosen.
While there has been a surge of studies on this problem using Online-Mirror-Descent (OMD) methods, very little is known about the Follow-the-Perturbed-Leader (FTPL) methods.
We develop the first no-regret algorithm for learning AMDPs in the infinite-horizon setting with bandit feedback and transitions.
arXiv Detail & Related papers (2022-05-26T15:55:50Z) - Machine Learning for Online Algorithm Selection under Censored Feedback [71.6879432974126]
In online algorithm selection (OAS), instances of an algorithmic problem class are presented to an agent one after another, and the agent has to quickly select a presumably best algorithm from a fixed set of candidate algorithms.
For decision problems such as satisfiability (SAT), quality typically refers to the algorithm's runtime.
In this work, we revisit multi-armed bandit algorithms for OAS and discuss their capability of dealing with the problem.
We adapt them towards runtime-oriented losses, allowing for partially censored data while keeping a space- and time-complexity independent of the time horizon.
arXiv Detail & Related papers (2021-09-13T18:10:52Z) - Regret Analysis in Deterministic Reinforcement Learning [78.31410227443102]
We study the problem of regret, which is central to the analysis and design of optimal learning algorithms.
We present logarithmic problem-specific regret lower bounds that explicitly depend on the system parameter.
arXiv Detail & Related papers (2021-06-27T23:41:57Z) - Parallel Stochastic Mirror Descent for MDPs [72.75921150912556]
We consider the problem of learning the optimal policy for infinite-horizon Markov decision processes (MDPs)
Some variant of Mirror Descent is proposed for convex programming problems with Lipschitz-continuous functionals.
We analyze this algorithm in a general case and obtain an estimate of the convergence rate that does not accumulate errors during the operation of the method.
arXiv Detail & Related papers (2021-02-27T19:28:39Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Accelerated Message Passing for Entropy-Regularized MAP Inference [89.15658822319928]
Maximum a posteriori (MAP) inference in discrete-valued random fields is a fundamental problem in machine learning.
Due to the difficulty of this problem, linear programming (LP) relaxations are commonly used to derive specialized message passing algorithms.
We present randomized methods for accelerating these algorithms by leveraging techniques that underlie classical accelerated gradient.
arXiv Detail & Related papers (2020-07-01T18:43:32Z) - Towards Minimax Optimal Reinforcement Learning in Factored Markov
Decision Processes [53.72166325215299]
We study minimax optimal reinforcement learning in episodic factored Markov decision processes (FMDPs)
First one achieves minimax optimal regret guarantees for a rich class of factored structures.
Second one enjoys better computational complexity with a slightly worse regret.
arXiv Detail & Related papers (2020-06-24T00:50:17Z) - Reinforcement Learning in Factored MDPs: Oracle-Efficient Algorithms and
Tighter Regret Bounds for the Non-Episodic Setting [24.90164851620799]
We study reinforcement learning in non-episodic factored Markov decision processes (FMDPs)
We propose two near-optimal and oracle-efficient algorithms for FMDPs.
Our oracle-efficient algorithms outperform previously proposed near-optimal algorithms on computer network administration simulations.
arXiv Detail & Related papers (2020-02-06T15:19:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.