A general Markov decision process formalism for action-state
entropy-regularized reward maximization
- URL: http://arxiv.org/abs/2302.01098v1
- Date: Thu, 2 Feb 2023 13:40:12 GMT
- Title: A general Markov decision process formalism for action-state
entropy-regularized reward maximization
- Authors: Dmytro Grytskyy, Jorge Ram\'irez-Ruiz, Rub\'en Moreno-Bote
- Abstract summary: Previous work has addressed different forms of action, state and action-state entropy regularization, pure exploration and space occupation.
These problems have become extremely relevant for regularization, generalization and learning.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Previous work has separately addressed different forms of action, state and
action-state entropy regularization, pure exploration and space occupation.
These problems have become extremely relevant for regularization,
generalization, speeding up learning and providing robust solutions at
unprecedented levels. However, solutions of those problems are hectic, ranging
from convex and non-convex optimization, and unconstrained optimization to
constrained optimization. Here we provide a general dual function formalism
that transforms the constrained optimization problem into an unconstrained
convex one for any mixture of action and state entropies. The cases with pure
action entropy and pure state entropy are understood as limits of the mixture.
Related papers
- Double Duality: Variational Primal-Dual Policy Optimization for
Constrained Reinforcement Learning [132.7040981721302]
We study the Constrained Convex Decision Process (MDP), where the goal is to minimize a convex functional of the visitation measure.
Design algorithms for a constrained convex MDP faces several challenges, including handling the large state space.
arXiv Detail & Related papers (2024-02-16T16:35:18Z) - Analyzing and Enhancing the Backward-Pass Convergence of Unrolled
Optimization [50.38518771642365]
The integration of constrained optimization models as components in deep networks has led to promising advances on many specialized learning tasks.
A central challenge in this setting is backpropagation through the solution of an optimization problem, which often lacks a closed form.
This paper provides theoretical insights into the backward pass of unrolled optimization, showing that it is equivalent to the solution of a linear system by a particular iterative method.
A system called Folded Optimization is proposed to construct more efficient backpropagation rules from unrolled solver implementations.
arXiv Detail & Related papers (2023-12-28T23:15:18Z) - Generalizing Bayesian Optimization with Decision-theoretic Entropies [102.82152945324381]
We consider a generalization of Shannon entropy from work in statistical decision theory.
We first show that special cases of this entropy lead to popular acquisition functions used in BO procedures.
We then show how alternative choices for the loss yield a flexible family of acquisition functions.
arXiv Detail & Related papers (2022-10-04T04:43:58Z) - Algorithm for Constrained Markov Decision Process with Linear
Convergence [55.41644538483948]
An agent aims to maximize the expected accumulated discounted reward subject to multiple constraints on its costs.
A new dual approach is proposed with the integration of two ingredients: entropy regularized policy and Vaidya's dual.
The proposed approach is shown to converge (with linear rate) to the global optimum.
arXiv Detail & Related papers (2022-06-03T16:26:38Z) - Faster Algorithm and Sharper Analysis for Constrained Markov Decision
Process [56.55075925645864]
The problem of constrained decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated discounted reward subject to multiple constraints.
A new utilities-dual convex approach is proposed with novel integration of three ingredients: regularized policy, dual regularizer, and Nesterov's gradient descent dual.
This is the first demonstration that nonconcave CMDP problems can attain the lower bound of $mathcal O (1/epsilon)$ for all complexity optimization subject to convex constraints.
arXiv Detail & Related papers (2021-10-20T02:57:21Z) - A Dual Approach to Constrained Markov Decision Processes with Entropy
Regularization [7.483040617090451]
We study entropy-regularized constrained Markov decision processes (CMDPs) under the soft-max parameterization.
Our theoretical analysis shows that its Lagrangian dual function is smooth and the Lagrangian duality gap can be decomposed into the primality gap and the constraint violation.
arXiv Detail & Related papers (2021-10-17T21:26:40Z) - Transient growth of accelerated first-order methods for strongly convex
optimization problems [1.6114012813668934]
In this paper, we examine the transient behavior of accelerated first-order optimization algorithms.
For quadratic optimization problems, we employ tools from linear systems theory to show that transient growth arises from the presence of non-normal dynamics.
For strongly convex smooth optimization problems, we utilize the theory of integral quadratic constraints to establish an upper bound on the magnitude of the transient response of Nesterov's accelerated method.
arXiv Detail & Related papers (2021-03-14T20:01:14Z) - Action Redundancy in Reinforcement Learning [54.291331971813364]
We show that transition entropy can be described by two terms; namely, model-dependent transition entropy and action redundancy.
Our results suggest that action redundancy is a fundamental problem in reinforcement learning.
arXiv Detail & Related papers (2021-02-22T19:47:26Z) - Modeling Linear Inequality Constraints in Quadratic Binary Optimization
for Variational Quantum Eigensolver [0.0]
This paper introduces the use of tailored variational forms for variational quantum eigensolver.
Four constraints that usually appear in several optimization problems are modeled.
The main advantage of the proposed methodology is that the number of parameters on the variational form remain constant.
arXiv Detail & Related papers (2020-07-26T23:36:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.