Related papers: Minimizing the Outage Probability in a Markov Decision Process

Minimizing the Outage Probability in a Markov Decision Process

URL: http://arxiv.org/abs/2302.14714v1
Date: Tue, 28 Feb 2023 16:26:23 GMT
Title: Minimizing the Outage Probability in a Markov Decision Process
Authors: Vincent Corlay and Jean-Christophe Sibel
Abstract summary: We propose an algorithm which enables to optimize an alternative objective: the probability that the gain is greater than a given value. The proposed algorithm can be seen as an extension of the value iteration algorithm.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Standard Markov decision process (MDP) and reinforcement learning algorithms optimize the policy with respect to the expected gain. We propose an algorithm which enables to optimize an alternative objective: the probability that the gain is greater than a given value. The algorithm can be seen as an extension of the value iteration algorithm. We also show how the proposed algorithm could be generalized to use neural networks, similarly to the deep Q learning extension of Q learning.

Related papers

Rank-One Modified Value Iteration [3.04988705714342]
We provide a novel algorithm for solving planning and learning problems of Markov decision processes.<n>We show that the proposed algorithm consistently outperforms first-order algorithms and their accelerated versions for both planning and learning problems.
arXiv Detail & Related papers (2025-05-03T14:06:50Z)
On Policy Evaluation Algorithms in Distributional Reinforcement Learning [0.0]
We introduce a novel class of algorithms to efficiently approximate the unknown return distributions in policy evaluation problems from distributional reinforcement learning (DRL) For a plain instance of our proposed class of algorithms we prove error bounds, both within Wasserstein and Kolmogorov--Smirnov distances. For return distributions having probability density functions the algorithms yield approximations for these densities; error bounds are given within supremum norm.
arXiv Detail & Related papers (2024-07-19T10:06:01Z)
Deterministic Trajectory Optimization through Probabilistic Optimal Control [3.2771631221674333]
We propose two new algorithms for discrete-time deterministic finite-horizon nonlinear optimal control problems. Both algorithms are inspired by a novel theoretical paradigm known as probabilistic optimal control. We show that the application of this algorithm results in a fixed point of probabilistic policies that converge to the deterministic optimal policy.
arXiv Detail & Related papers (2024-07-18T09:17:47Z)
From Optimization to Control: Quasi Policy Iteration [3.4376560669160394]
We introduce a novel control algorithm coined as quasi-policy iteration (QPI) QPI is based on a novel approximation of the "Hessian" matrix in the policy iteration algorithm by exploiting two linear structural constraints specific to MDPs. It exhibits an empirical convergence behavior similar to policy iteration with a very low sensitivity to the discount factor.
arXiv Detail & Related papers (2023-11-18T21:00:14Z)
Provably Efficient Representation Learning with Tractable Planning in Low-Rank POMDP [81.00800920928621]
We study representation learning in partially observable Markov Decision Processes (POMDPs) We first present an algorithm for decodable POMDPs that combines maximum likelihood estimation (MLE) and optimism in the face of uncertainty (OFU) We then show how to adapt this algorithm to also work in the broader class of $gamma$-observable POMDPs.
arXiv Detail & Related papers (2023-06-21T16:04:03Z)
Stochastic Ratios Tracking Algorithm for Large Scale Machine Learning Problems [0.7614628596146599]
We propose a novel algorithm for adaptive step length selection in the classical SGD framework. Under reasonable conditions, the algorithm produces step lengths in line with well-established theoretical requirements. We show that the algorithm can generate step lengths comparable to the best step length obtained from manual tuning.
arXiv Detail & Related papers (2023-05-17T06:22:11Z)
Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting. We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z)
Uniform-PAC Bounds for Reinforcement Learning with Linear Function Approximation [92.3161051419884]
We study reinforcement learning with linear function approximation. Existing algorithms only have high-probability regret and/or Probably Approximately Correct (PAC) sample complexity guarantees. We propose a new algorithm called FLUTE, which enjoys uniform-PAC convergence to the optimal policy with high probability.
arXiv Detail & Related papers (2021-06-22T08:48:56Z)
A Probabilistically Motivated Learning Rate Adaptation for Stochastic Optimization [20.77923050735746]
We provide a probabilistic motivation, in terms of Gaussian inference, for popular first-order methods. The inference allows us to relate the learning rate to a dimensionless quantity that can be automatically adapted during training. The resulting meta-algorithm is shown to adapt learning rates in a robust manner across a large range of initial values.
arXiv Detail & Related papers (2021-02-22T10:26:31Z)
Evolving Reinforcement Learning Algorithms [186.62294652057062]
We propose a method for meta-learning reinforcement learning algorithms. The learned algorithms are domain-agnostic and can generalize to new environments not seen during training. We highlight two learned algorithms which obtain good generalization performance over other classical control tasks, gridworld type tasks, and Atari games.
arXiv Detail & Related papers (2021-01-08T18:55:07Z)
Towards Meta-Algorithm Selection [78.13985819417974]
Instance-specific algorithm selection (AS) deals with the automatic selection of an algorithm from a fixed set of candidates. We show that meta-algorithm-selection can indeed prove beneficial in some cases.
arXiv Detail & Related papers (2020-11-17T17:27:33Z)
Logistic Q-Learning [87.00813469969167]
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs. The main feature of our algorithm is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error.
arXiv Detail & Related papers (2020-10-21T17:14:31Z)
Adaptive Sampling for Best Policy Identification in Markov Decision Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model. The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.