Related papers: Parameterized MDPs and Reinforcement Learning Problems -- A Maximum Entropy Principle Based Framework

Parameterized MDPs and Reinforcement Learning Problems -- A Maximum Entropy Principle Based Framework

URL: http://arxiv.org/abs/2006.09646v3
Date: Wed, 19 Jan 2022 06:43:23 GMT
Title: Parameterized MDPs and Reinforcement Learning Problems -- A Maximum Entropy Principle Based Framework
Authors: Amber Srivastava and Srinivasa M Salapaka
Abstract summary: We present a framework to address a class of sequential decision making problems. Our framework features learning the optimal control policy with robustness to noisy data.
Score: 2.741266294612776
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a framework to address a class of sequential decision making problems. Our framework features learning the optimal control policy with robustness to noisy data, determining the unknown state and action parameters, and performing sensitivity analysis with respect to problem parameters. We consider two broad categories of sequential decision making problems modelled as infinite horizon Markov Decision Processes (MDPs) with (and without) an absorbing state. The central idea underlying our framework is to quantify exploration in terms of the Shannon Entropy of the trajectories under the MDP and determine the stochastic policy that maximizes it while guaranteeing a low value of the expected cost along a trajectory. This resulting policy enhances the quality of exploration early on in the learning process, and consequently allows faster convergence rates and robust solutions even in the presence of noisy data as demonstrated in our comparisons to popular algorithms such as Q-learning, Double Q-learning and entropy regularized Soft Q-learning. The framework extends to the class of parameterized MDP and RL problems, where states and actions are parameter dependent, and the objective is to determine the optimal parameters along with the corresponding optimal policy. Here, the associated cost function can possibly be non-convex with multiple poor local minima. Simulation results applied to a 5G small cell network problem demonstrate successful determination of communication routes and the small cell locations. We also obtain sensitivity measures to problem parameters and robustness to noisy environment data.

Related papers

Certifiably Robust Policies for Uncertain Parametric Environments [57.2416302384766]
We propose a framework based on parametric Markov decision processes (MDPs) with unknown distributions over parameters. We learn and analyse IMDPs for a set of unknown sample environments induced by parameters. We show that our approach produces tight bounds on a policy's performance with high confidence.
arXiv Detail & Related papers (2024-08-06T10:48:15Z)
Two-Stage ML-Guided Decision Rules for Sequential Decision Making under Uncertainty [55.06411438416805]
Sequential Decision Making under Uncertainty (SDMU) is ubiquitous in many domains such as energy, finance, and supply chains. Some SDMU are naturally modeled as Multistage Problems (MSPs) but the resulting optimizations are notoriously challenging from a computational standpoint. This paper introduces a novel approach Two-Stage General Decision Rules (TS-GDR) to generalize the policy space beyond linear functions. The effectiveness of TS-GDR is demonstrated through an instantiation using Deep Recurrent Neural Networks named Two-Stage Deep Decision Rules (TS-LDR)
arXiv Detail & Related papers (2024-05-23T18:19:47Z)
Stepsize Learning for Policy Gradient Methods in Contextual Markov Decision Processes [35.889129338603446]
Policy-based algorithms are among the most widely adopted techniques in model-free RL. They tend to struggle when asked to accomplish a series of heterogeneous tasks. We introduce a new formulation, known as meta-MDP, that can be used to solve any hyper parameter selection problem in RL.
arXiv Detail & Related papers (2023-06-13T12:58:12Z)
High-probability sample complexities for policy evaluation with linear function approximation [88.87036653258977]
We investigate the sample complexities required to guarantee a predefined estimation error of the best linear coefficients for two widely-used policy evaluation algorithms. We establish the first sample complexity bound with high-probability convergence guarantee that attains the optimal dependence on the tolerance level.
arXiv Detail & Related papers (2023-05-30T12:58:39Z)
Addressing the issue of stochastic environments and local decision-making in multi-objective reinforcement learning [0.0]
Multi-objective reinforcement learning (MORL) is a relatively new field which builds on conventional Reinforcement Learning (RL) This thesis focuses on what factors influence the frequency with which value-based MORL Q-learning algorithms learn the optimal policy for an environment.
arXiv Detail & Related papers (2022-11-16T04:56:42Z)
Reinforcement Learning with a Terminator [80.34572413850186]
We learn the parameters of the TerMDP and leverage the structure of the estimation problem to provide state-wise confidence bounds. We use these to construct a provably-efficient algorithm, which accounts for termination, and bound its regret.
arXiv Detail & Related papers (2022-05-30T18:40:28Z)
Sample Complexity of Robust Reinforcement Learning with a Generative Model [0.0]
We propose a model-based reinforcement learning (RL) algorithm for learning an $epsilon$-optimal robust policy. We consider three different forms of uncertainty sets, characterized by the total variation distance, chi-square divergence, and KL divergence. In addition to the sample complexity results, we also present a formal analytical argument on the benefit of using robust policies.
arXiv Detail & Related papers (2021-12-02T18:55:51Z)
Learning MDPs from Features: Predict-Then-Optimize for Sequential Decision Problems by Reinforcement Learning [52.74071439183113]
We study the predict-then-optimize framework in the context of sequential decision problems (formulated as MDPs) solved via reinforcement learning. Two significant computational challenges arise in applying decision-focused learning to MDPs.
arXiv Detail & Related papers (2021-06-06T23:53:31Z)
Policy Information Capacity: Information-Theoretic Measure for Task Complexity in Deep Reinforcement Learning [83.66080019570461]
We propose two environment-agnostic, algorithm-agnostic quantitative metrics for task difficulty. We show that these metrics have higher correlations with normalized task solvability scores than a variety of alternatives. These metrics can also be used for fast and compute-efficient optimizations of key design parameters.
arXiv Detail & Related papers (2021-03-23T17:49:50Z)
Robust Reinforcement Learning with Wasserstein Constraint [49.86490922809473]
We show the existence of optimal robust policies, provide a sensitivity analysis for the perturbations, and then design a novel robust learning algorithm. The effectiveness of the proposed algorithm is verified in the Cart-Pole environment.
arXiv Detail & Related papers (2020-06-01T13:48:59Z)
Online Parameter Estimation for Safety-Critical Systems with Gaussian Processes [6.122161391301866]
We present a Bayesian optimization framework based on Gaussian processes (GPs) for online parameter estimation. It uses an efficient search strategy over a response surface in the parameter space for finding the global optima with minimal function evaluations. We demonstrate our technique on an actuated planar pendulum and safety-critical quadrotor in simulation with changing parameters.
arXiv Detail & Related papers (2020-02-18T20:38:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.