Related papers: Regret Analysis in Deterministic Reinforcement Learning

Regret Analysis in Deterministic Reinforcement Learning

URL: http://arxiv.org/abs/2106.14338v1
Date: Sun, 27 Jun 2021 23:41:57 GMT
Title: Regret Analysis in Deterministic Reinforcement Learning
Authors: Damianos Tranos and Alexandre Proutiere
Abstract summary: We study the problem of regret, which is central to the analysis and design of optimal learning algorithms. We present logarithmic problem-specific regret lower bounds that explicitly depend on the system parameter.
Score: 78.31410227443102
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We consider Markov Decision Processes (MDPs) with deterministic transitions and study the problem of regret minimization, which is central to the analysis and design of optimal learning algorithms. We present logarithmic problem-specific regret lower bounds that explicitly depend on the system parameter (in contrast to previous minimax approaches) and thus, truly quantify the fundamental limit of performance achievable by any learning algorithm. Deterministic MDPs can be interpreted as graphs and analyzed in terms of their cycles, a fact which we leverage in order to identify a class of deterministic MDPs whose regret lower bound can be determined numerically. We further exemplify this result on a deterministic line search problem, and a deterministic MDP with state-dependent rewards, whose regret lower bounds we can state explicitly. These bounds share similarities with the known problem-specific bound of the multi-armed bandit problem and suggest that navigation on a deterministic MDP need not have an effect on the performance of a learning algorithm.

Related papers

Policy Testing in Markov Decision Processes [48.642181362172906]
We study the policy testing problem in discounted decision processes (MDP) under the fixed-confidence setting.<n>The goal is to determine whether the value of a given policy exceeds a numerical threshold.
arXiv Detail & Related papers (2025-05-21T10:13:54Z)
Sound Heuristic Search Value Iteration for Undiscounted POMDPs with Reachability Objectives [16.101435842520473]
This paper studies the challenging yet important problem in POMDPs known as the (indefinite-horizon) Maximal Reachability Probability Problem. Inspired by the success of point-based methods developed for discounted problems, we study their extensions to MRPP. We present a novel algorithm that leverages the strengths of these techniques for efficient exploration of the belief space.
arXiv Detail & Related papers (2024-06-05T02:33:50Z)
Online POMDP Planning with Anytime Deterministic Guarantees [11.157761902108692]
Planning under uncertainty can be mathematically formalized using partially observable Markov decision processes (POMDPs) Finding an optimal plan for POMDPs can be computationally expensive and is feasible only for small tasks. We derive a deterministic relationship between a simplified solution that is easier to obtain and the theoretically optimal one.
arXiv Detail & Related papers (2023-10-03T04:40:38Z)
Reinforcement Learning with a Terminator [80.34572413850186]
We learn the parameters of the TerMDP and leverage the structure of the estimation problem to provide state-wise confidence bounds. We use these to construct a provably-efficient algorithm, which accounts for termination, and bound its regret.
arXiv Detail & Related papers (2022-05-30T18:40:28Z)
Beyond Value-Function Gaps: Improved Instance-Dependent Regret Bounds for Episodic Reinforcement Learning [50.44564503645015]
We provide improved gap-dependent regret bounds for reinforcement learning in finite episodic Markov decision processes. We prove tighter upper regret bounds for optimistic algorithms and accompany them with new information-theoretic lower bounds for a large class of MDPs.
arXiv Detail & Related papers (2021-07-02T20:36:05Z)
Learning MDPs from Features: Predict-Then-Optimize for Sequential Decision Problems by Reinforcement Learning [52.74071439183113]
We study the predict-then-optimize framework in the context of sequential decision problems (formulated as MDPs) solved via reinforcement learning. Two significant computational challenges arise in applying decision-focused learning to MDPs.
arXiv Detail & Related papers (2021-06-06T23:53:31Z)
Minimum-Delay Adaptation in Non-Stationary Reinforcement Learning via Online High-Confidence Change-Point Detection [7.685002911021767]
We introduce an algorithm that efficiently learns policies in non-stationary environments. It analyzes a possibly infinite stream of data and computes, in real-time, high-confidence change-point detection statistics. We show that (i) this algorithm minimizes the delay until unforeseen changes to a context are detected, thereby allowing for rapid responses.
arXiv Detail & Related papers (2021-05-20T01:57:52Z)
Identification of Unexpected Decisions in Partially Observable Monte-Carlo Planning: a Rule-Based Approach [78.05638156687343]
We propose a methodology for analyzing POMCP policies by inspecting their traces. The proposed method explores local properties of policy behavior to identify unexpected decisions. We evaluate our approach on Tiger, a standard benchmark for POMDPs, and a real-world problem related to mobile robot navigation.
arXiv Detail & Related papers (2020-12-23T15:09:28Z)
Adaptive Sampling for Best Policy Identification in Markov Decision Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model. The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.