Beyond Average Return in Markov Decision Processes
- URL: http://arxiv.org/abs/2310.20266v2
- Date: Mon, 19 Feb 2024 10:46:18 GMT
- Title: Beyond Average Return in Markov Decision Processes
- Authors: Alexandre Marthe (ENS de Lyon, UMPA-ENSL), Aur\'elien Garivier
(UMPA-ENSL (MC2)), Claire Vernade
- Abstract summary: We prove that only generalized means can be optimized exactly, even in the more general framework of Distributional Reinforcement Learning (DistRL).
We provide error bounds on the resulting estimators, and discuss the potential of this approach as well as its limitations.
- Score: 49.157108194438635
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: What are the functionals of the reward that can be computed and optimized
exactly in Markov Decision Processes?In the finite-horizon, undiscounted
setting, Dynamic Programming (DP) can only handle these operations efficiently
for certain classes of statistics. We summarize the characterization of these
classes for policy evaluation, and give a new answer for the planning problem.
Interestingly, we prove that only generalized means can be optimized exactly,
even in the more general framework of Distributional Reinforcement Learning
(DistRL).DistRL permits, however, to evaluate other functionals approximately.
We provide error bounds on the resulting estimators, and discuss the potential
of this approach as well as its limitations.These results contribute to
advancing the theory of Markov Decision Processes by examining overall
characteristics of the return, and particularly risk-conscious strategies.
Related papers
- On Policy Evaluation Algorithms in Distributional Reinforcement Learning [0.0]
We introduce a novel class of algorithms to efficiently approximate the unknown return distributions in policy evaluation problems from distributional reinforcement learning (DRL)
For a plain instance of our proposed class of algorithms we prove error bounds, both within Wasserstein and Kolmogorov--Smirnov distances.
For return distributions having probability density functions the algorithms yield approximations for these densities; error bounds are given within supremum norm.
arXiv Detail & Related papers (2024-07-19T10:06:01Z) - Risk-sensitive Markov Decision Process and Learning under General
Utility Functions [3.6260136172126667]
Reinforcement Learning (RL) has gained substantial attention across diverse application domains and theoretical investigations.
We propose a modified value algorithm that employs an epsilon-covering over the space of cumulative reward.
In the absence of a simulator, our algorithm, designed with an upper-confidence-bound exploration approach, identifies a near-optimal policy.
arXiv Detail & Related papers (2023-11-22T18:50:06Z) - Provably Efficient UCB-type Algorithms For Learning Predictive State
Representations [55.00359893021461]
The sequential decision-making problem is statistically learnable if it admits a low-rank structure modeled by predictive state representations (PSRs)
This paper proposes the first known UCB-type approach for PSRs, featuring a novel bonus term that upper bounds the total variation distance between the estimated and true models.
In contrast to existing approaches for PSRs, our UCB-type algorithms enjoy computational tractability, last-iterate guaranteed near-optimal policy, and guaranteed model accuracy.
arXiv Detail & Related papers (2023-07-01T18:35:21Z) - GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP,
and Beyond [101.5329678997916]
We study sample efficient reinforcement learning (RL) under the general framework of interactive decision making.
We propose a novel complexity measure, generalized eluder coefficient (GEC), which characterizes the fundamental tradeoff between exploration and exploitation.
We show that RL problems with low GEC form a remarkably rich class, which subsumes low Bellman eluder dimension problems, bilinear class, low witness rank problems, PO-bilinear class, and generalized regular PSR.
arXiv Detail & Related papers (2022-11-03T16:42:40Z) - Offline Policy Optimization with Eligible Actions [34.4530766779594]
offline policy optimization could have a large impact on many real-world decision-making problems.
Importance sampling and its variants are a commonly used type of estimator in offline policy evaluation.
We propose an algorithm to avoid this overfitting through a new per-state-neighborhood normalization constraint.
arXiv Detail & Related papers (2022-07-01T19:18:15Z) - Identification of Unexpected Decisions in Partially Observable
Monte-Carlo Planning: a Rule-Based Approach [78.05638156687343]
We propose a methodology for analyzing POMCP policies by inspecting their traces.
The proposed method explores local properties of policy behavior to identify unexpected decisions.
We evaluate our approach on Tiger, a standard benchmark for POMDPs, and a real-world problem related to mobile robot navigation.
arXiv Detail & Related papers (2020-12-23T15:09:28Z) - Logistic Q-Learning [87.00813469969167]
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs.
The main feature of our algorithm is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error.
arXiv Detail & Related papers (2020-10-21T17:14:31Z) - Kalman meets Bellman: Improving Policy Evaluation through Value Tracking [59.691919635037216]
Policy evaluation is a key process in Reinforcement Learning (RL)
We devise an optimization method, called Kalman Optimization for Value Approximation (KOVA)
KOVA minimizes a regularized objective function that concerns both parameter and noisy return uncertainties.
arXiv Detail & Related papers (2020-02-17T13:30:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.