Related papers: Refined Analysis of FPL for Adversarial Markov Decision Processes

Refined Analysis of FPL for Adversarial Markov Decision Processes

URL: http://arxiv.org/abs/2008.09251v1
Date: Fri, 21 Aug 2020 01:12:10 GMT
Title: Refined Analysis of FPL for Adversarial Markov Decision Processes
Authors: Yuanhao Wang and Kefan Dong
Abstract summary: Follow-the-PerturbedLeader (FPL) based algorithms have been proposed in previous literature. We improve the analysis of FPL based algorithms in both settings, matching the current best regret bounds using faster and simpler algorithms.
Score: 9.188318506016897
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the adversarial Markov Decision Process (MDP) problem, where the rewards for the MDP can be adversarially chosen, and the transition function can be either known or unknown. In both settings, Follow-the-PerturbedLeader (FPL) based algorithms have been proposed in previous literature. However, the established regret bounds for FPL based algorithms are worse than algorithms based on mirrordescent. We improve the analysis of FPL based algorithms in both settings, matching the current best regret bounds using faster and simpler algorithms.

Related papers

Optimism in the Face of Ambiguity Principle for Multi-Armed Bandits [6.7310264583128445]
Follow-The-Regularized-Leader (FTRL) algorithms often enjoy optimal regret for adversarial as well as bandit problems. We propose a new FTPL algorithm that generates optimal policies for both adversarial and multi-armed bandits.
arXiv Detail & Related papers (2024-09-30T16:00:23Z)
Warm-up Free Policy Optimization: Improved Regret in Linear Markov Decision Processes [12.76843681997386]
Policy Optimization (PO) methods are among the most popular Reinforcement Learning (RL) algorithms in practice. This paper proposes a PO-based algorithm with rate-optimal regret guarantees under the linear Markov Decision Process (MDP) model. Our algorithm achieves regret with improved dependence on the other parameters of the problem.
arXiv Detail & Related papers (2024-07-03T12:36:24Z)
Follow-the-Perturbed-Leader for Adversarial Markov Decision Processes with Bandit Feedback [35.687473978249535]
We consider regret for Adversarial Markov Decision Processes (AMDPs) where the loss functions are changing over time and adversarially chosen. While there has been a surge of studies on this problem using Online-Mirror-Descent (OMD) methods, very little is known about the Follow-the-Perturbed-Leader (FTPL) methods. We develop the first no-regret algorithm for learning AMDPs in the infinite-horizon setting with bandit feedback and transitions.
arXiv Detail & Related papers (2022-05-26T15:55:50Z)
Machine Learning for Online Algorithm Selection under Censored Feedback [71.6879432974126]
In online algorithm selection (OAS), instances of an algorithmic problem class are presented to an agent one after another, and the agent has to quickly select a presumably best algorithm from a fixed set of candidate algorithms. For decision problems such as satisfiability (SAT), quality typically refers to the algorithm's runtime. In this work, we revisit multi-armed bandit algorithms for OAS and discuss their capability of dealing with the problem. We adapt them towards runtime-oriented losses, allowing for partially censored data while keeping a space- and time-complexity independent of the time horizon.
arXiv Detail & Related papers (2021-09-13T18:10:52Z)
Regret Analysis in Deterministic Reinforcement Learning [78.31410227443102]
We study the problem of regret, which is central to the analysis and design of optimal learning algorithms. We present logarithmic problem-specific regret lower bounds that explicitly depend on the system parameter.
arXiv Detail & Related papers (2021-06-27T23:41:57Z)
Parallel Stochastic Mirror Descent for MDPs [72.75921150912556]
We consider the problem of learning the optimal policy for infinite-horizon Markov decision processes (MDPs) Some variant of Mirror Descent is proposed for convex programming problems with Lipschitz-continuous functionals. We analyze this algorithm in a general case and obtain an estimate of the convergence rate that does not accumulate errors during the operation of the method.
arXiv Detail & Related papers (2021-02-27T19:28:39Z)
Adaptive Sampling for Best Policy Identification in Markov Decision Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model. The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z)
Accelerated Message Passing for Entropy-Regularized MAP Inference [89.15658822319928]
Maximum a posteriori (MAP) inference in discrete-valued random fields is a fundamental problem in machine learning. Due to the difficulty of this problem, linear programming (LP) relaxations are commonly used to derive specialized message passing algorithms. We present randomized methods for accelerating these algorithms by leveraging techniques that underlie classical accelerated gradient.
arXiv Detail & Related papers (2020-07-01T18:43:32Z)
Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes [53.72166325215299]
We study minimax optimal reinforcement learning in episodic factored Markov decision processes (FMDPs) First one achieves minimax optimal regret guarantees for a rich class of factored structures. Second one enjoys better computational complexity with a slightly worse regret.
arXiv Detail & Related papers (2020-06-24T00:50:17Z)
Reinforcement Learning in Factored MDPs: Oracle-Efficient Algorithms and Tighter Regret Bounds for the Non-Episodic Setting [24.90164851620799]
We study reinforcement learning in non-episodic factored Markov decision processes (FMDPs) We propose two near-optimal and oracle-efficient algorithms for FMDPs. Our oracle-efficient algorithms outperform previously proposed near-optimal algorithms on computer network administration simulations.
arXiv Detail & Related papers (2020-02-06T15:19:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.