A Gradient-Aware Search Algorithm for Constrained Markov Decision
Processes
- URL: http://arxiv.org/abs/2005.03718v1
- Date: Thu, 7 May 2020 19:38:09 GMT
- Title: A Gradient-Aware Search Algorithm for Constrained Markov Decision
Processes
- Authors: Sami Khairy, Prasanna Balaprakash, Lin X. Cai
- Abstract summary: We first prove that the optimization objective in the dual linear program of a finite CMDP is a piece-wise linear convex function with respect to the Lagrange penalty multipliers.
We propose a novel two-level Gradient-Aware Search (GAS) algorithm which exploits the PWLC structure to find the optimal state-value function and Lagrange penalty multipliers of a finite CMDP.
- Score: 9.728259735794987
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The canonical solution methodology for finite constrained Markov decision
processes (CMDPs), where the objective is to maximize the expected
infinite-horizon discounted rewards subject to the expected infinite-horizon
discounted costs constraints, is based on convex linear programming. In this
brief, we first prove that the optimization objective in the dual linear
program of a finite CMDP is a piece-wise linear convex function (PWLC) with
respect to the Lagrange penalty multipliers. Next, we propose a novel two-level
Gradient-Aware Search (GAS) algorithm which exploits the PWLC structure to find
the optimal state-value function and Lagrange penalty multipliers of a finite
CMDP. The proposed algorithm is applied in two stochastic control problems with
constraints: robot navigation in a grid world and solar-powered unmanned aerial
vehicle (UAV)-based wireless network management. We empirically compare the
convergence performance of the proposed GAS algorithm with binary search (BS),
Lagrangian primal-dual optimization (PDO), and Linear Programming (LP).
Compared with benchmark algorithms, it is shown that the proposed GAS algorithm
converges to the optimal solution faster, does not require hyper-parameter
tuning, and is not sensitive to initialization of the Lagrange penalty
multiplier.
Related papers
- A Learned Proximal Alternating Minimization Algorithm and Its Induced Network for a Class of Two-block Nonconvex and Nonsmooth Optimization [4.975853671529418]
This work proposes a general learned alternating minimization algorithm, LPAM, for solving learnable two-block nonsmooth problems.
The proposed LPAM-net is parameter-efficient and has favourable performance in comparison with some state-of-the-art methods.
arXiv Detail & Related papers (2024-11-10T02:02:32Z) - Two-Stage ML-Guided Decision Rules for Sequential Decision Making under Uncertainty [55.06411438416805]
Sequential Decision Making under Uncertainty (SDMU) is ubiquitous in many domains such as energy, finance, and supply chains.
Some SDMU are naturally modeled as Multistage Problems (MSPs) but the resulting optimizations are notoriously challenging from a computational standpoint.
This paper introduces a novel approach Two-Stage General Decision Rules (TS-GDR) to generalize the policy space beyond linear functions.
The effectiveness of TS-GDR is demonstrated through an instantiation using Deep Recurrent Neural Networks named Two-Stage Deep Decision Rules (TS-LDR)
arXiv Detail & Related papers (2024-05-23T18:19:47Z) - Floorplanning of VLSI by Mixed-Variable Optimization [42.82770651937298]
This paper proposes memetic algorithms to solve mixed-variable floorplanning problems.
Proposed algorithms are superior to some celebrated B*-tree based floorplanning algorithms.
arXiv Detail & Related papers (2024-01-27T06:34:16Z) - On Linear Convergence of PI Consensus Algorithm under the Restricted Secant Inequality [5.35599092568615]
This paper considers solving distributed optimization problems in peer-to-peer multi-agent networks.
By using the proportional-integral (PI) control strategy, various algorithms with fixed stepsize have been developed.
arXiv Detail & Related papers (2023-09-30T15:54:52Z) - Accelerating Cutting-Plane Algorithms via Reinforcement Learning
Surrogates [49.84541884653309]
A current standard approach to solving convex discrete optimization problems is the use of cutting-plane algorithms.
Despite the existence of a number of general-purpose cut-generating algorithms, large-scale discrete optimization problems continue to suffer from intractability.
We propose a method for accelerating cutting-plane algorithms via reinforcement learning.
arXiv Detail & Related papers (2023-07-17T20:11:56Z) - Accelerated First-Order Optimization under Nonlinear Constraints [73.2273449996098]
We exploit between first-order algorithms for constrained optimization and non-smooth systems to design a new class of accelerated first-order algorithms.
An important property of these algorithms is that constraints are expressed in terms of velocities instead of sparse variables.
arXiv Detail & Related papers (2023-02-01T08:50:48Z) - Fast Computation of Optimal Transport via Entropy-Regularized Extragradient Methods [75.34939761152587]
Efficient computation of the optimal transport distance between two distributions serves as an algorithm that empowers various applications.
This paper develops a scalable first-order optimization-based method that computes optimal transport to within $varepsilon$ additive accuracy.
arXiv Detail & Related papers (2023-01-30T15:46:39Z) - Functional Constrained Optimization for Risk Aversion and Sparsity
Control [7.561780884831967]
Risk and sparsity requirements need to be enforced simultaneously in many applications, e.g., in portfolio optimization, assortment planning, and radiation planning.
We propose a Level Conditional Gradient (LCG) method, which generates a convex or sparse trajectory for these challenges.
We show that the method achieves a level single-set projection of the optimal value an inner conditional approximation (CGO) for solving mini-max sub gradient.
arXiv Detail & Related papers (2022-10-11T02:51:51Z) - Faster Algorithm and Sharper Analysis for Constrained Markov Decision
Process [56.55075925645864]
The problem of constrained decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated discounted reward subject to multiple constraints.
A new utilities-dual convex approach is proposed with novel integration of three ingredients: regularized policy, dual regularizer, and Nesterov's gradient descent dual.
This is the first demonstration that nonconcave CMDP problems can attain the lower bound of $mathcal O (1/epsilon)$ for all complexity optimization subject to convex constraints.
arXiv Detail & Related papers (2021-10-20T02:57:21Z) - Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex
Decentralized Optimization Over Time-Varying Networks [79.16773494166644]
We consider the task of minimizing the sum of smooth and strongly convex functions stored in a decentralized manner across the nodes of a communication network.
We design two optimal algorithms that attain these lower bounds.
We corroborate the theoretical efficiency of these algorithms by performing an experimental comparison with existing state-of-the-art methods.
arXiv Detail & Related papers (2021-06-08T15:54:44Z) - A Stochastic Composite Augmented Lagrangian Method For Reinforcement
Learning [9.204659134755795]
We consider the linear programming (LP) formulation for deep reinforcement learning.
The augmented Lagrangian method suffers the double-sampling obstacle in solving the LP.
A deep parameterized augment Lagrangian method is proposed.
arXiv Detail & Related papers (2021-05-20T13:08:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.