Parameterized MDPs and Reinforcement Learning Problems -- A Maximum
Entropy Principle Based Framework
- URL: http://arxiv.org/abs/2006.09646v3
- Date: Wed, 19 Jan 2022 06:43:23 GMT
- Title: Parameterized MDPs and Reinforcement Learning Problems -- A Maximum
Entropy Principle Based Framework
- Authors: Amber Srivastava and Srinivasa M Salapaka
- Abstract summary: We present a framework to address a class of sequential decision making problems.
Our framework features learning the optimal control policy with robustness to noisy data.
- Score: 2.741266294612776
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a framework to address a class of sequential decision making
problems. Our framework features learning the optimal control policy with
robustness to noisy data, determining the unknown state and action parameters,
and performing sensitivity analysis with respect to problem parameters. We
consider two broad categories of sequential decision making problems modelled
as infinite horizon Markov Decision Processes (MDPs) with (and without) an
absorbing state. The central idea underlying our framework is to quantify
exploration in terms of the Shannon Entropy of the trajectories under the MDP
and determine the stochastic policy that maximizes it while guaranteeing a low
value of the expected cost along a trajectory. This resulting policy enhances
the quality of exploration early on in the learning process, and consequently
allows faster convergence rates and robust solutions even in the presence of
noisy data as demonstrated in our comparisons to popular algorithms such as
Q-learning, Double Q-learning and entropy regularized Soft Q-learning. The
framework extends to the class of parameterized MDP and RL problems, where
states and actions are parameter dependent, and the objective is to determine
the optimal parameters along with the corresponding optimal policy. Here, the
associated cost function can possibly be non-convex with multiple poor local
minima. Simulation results applied to a 5G small cell network problem
demonstrate successful determination of communication routes and the small cell
locations. We also obtain sensitivity measures to problem parameters and
robustness to noisy environment data.
Related papers
- Two-Stage ML-Guided Decision Rules for Sequential Decision Making under Uncertainty [55.06411438416805]
Sequential Decision Making under Uncertainty (SDMU) is ubiquitous in many domains such as energy, finance, and supply chains.
Some SDMU are naturally modeled as Multistage Problems (MSPs) but the resulting optimizations are notoriously challenging from a computational standpoint.
This paper introduces a novel approach Two-Stage General Decision Rules (TS-GDR) to generalize the policy space beyond linear functions.
The effectiveness of TS-GDR is demonstrated through an instantiation using Deep Recurrent Neural Networks named Two-Stage Deep Decision Rules (TS-LDR)
arXiv Detail & Related papers (2024-05-23T18:19:47Z) - Stepsize Learning for Policy Gradient Methods in Contextual Markov
Decision Processes [35.889129338603446]
Policy-based algorithms are among the most widely adopted techniques in model-free RL.
They tend to struggle when asked to accomplish a series of heterogeneous tasks.
We introduce a new formulation, known as meta-MDP, that can be used to solve any hyper parameter selection problem in RL.
arXiv Detail & Related papers (2023-06-13T12:58:12Z) - High-probability sample complexities for policy evaluation with linear function approximation [88.87036653258977]
We investigate the sample complexities required to guarantee a predefined estimation error of the best linear coefficients for two widely-used policy evaluation algorithms.
We establish the first sample complexity bound with high-probability convergence guarantee that attains the optimal dependence on the tolerance level.
arXiv Detail & Related papers (2023-05-30T12:58:39Z) - Addressing the issue of stochastic environments and local
decision-making in multi-objective reinforcement learning [0.0]
Multi-objective reinforcement learning (MORL) is a relatively new field which builds on conventional Reinforcement Learning (RL)
This thesis focuses on what factors influence the frequency with which value-based MORL Q-learning algorithms learn the optimal policy for an environment.
arXiv Detail & Related papers (2022-11-16T04:56:42Z) - Reinforcement Learning with a Terminator [80.34572413850186]
We learn the parameters of the TerMDP and leverage the structure of the estimation problem to provide state-wise confidence bounds.
We use these to construct a provably-efficient algorithm, which accounts for termination, and bound its regret.
arXiv Detail & Related papers (2022-05-30T18:40:28Z) - Sample Complexity of Robust Reinforcement Learning with a Generative
Model [0.0]
We propose a model-based reinforcement learning (RL) algorithm for learning an $epsilon$-optimal robust policy.
We consider three different forms of uncertainty sets, characterized by the total variation distance, chi-square divergence, and KL divergence.
In addition to the sample complexity results, we also present a formal analytical argument on the benefit of using robust policies.
arXiv Detail & Related papers (2021-12-02T18:55:51Z) - Learning MDPs from Features: Predict-Then-Optimize for Sequential
Decision Problems by Reinforcement Learning [52.74071439183113]
We study the predict-then-optimize framework in the context of sequential decision problems (formulated as MDPs) solved via reinforcement learning.
Two significant computational challenges arise in applying decision-focused learning to MDPs.
arXiv Detail & Related papers (2021-06-06T23:53:31Z) - Policy Information Capacity: Information-Theoretic Measure for Task
Complexity in Deep Reinforcement Learning [83.66080019570461]
We propose two environment-agnostic, algorithm-agnostic quantitative metrics for task difficulty.
We show that these metrics have higher correlations with normalized task solvability scores than a variety of alternatives.
These metrics can also be used for fast and compute-efficient optimizations of key design parameters.
arXiv Detail & Related papers (2021-03-23T17:49:50Z) - Robust Reinforcement Learning with Wasserstein Constraint [49.86490922809473]
We show the existence of optimal robust policies, provide a sensitivity analysis for the perturbations, and then design a novel robust learning algorithm.
The effectiveness of the proposed algorithm is verified in the Cart-Pole environment.
arXiv Detail & Related papers (2020-06-01T13:48:59Z) - Online Parameter Estimation for Safety-Critical Systems with Gaussian
Processes [6.122161391301866]
We present a Bayesian optimization framework based on Gaussian processes (GPs) for online parameter estimation.
It uses an efficient search strategy over a response surface in the parameter space for finding the global optima with minimal function evaluations.
We demonstrate our technique on an actuated planar pendulum and safety-critical quadrotor in simulation with changing parameters.
arXiv Detail & Related papers (2020-02-18T20:38:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.