Related papers: A Gradient-Aware Search Algorithm for Constrained Markov Decision Processes

A Gradient-Aware Search Algorithm for Constrained Markov Decision Processes

URL: http://arxiv.org/abs/2005.03718v1
Date: Thu, 7 May 2020 19:38:09 GMT
Title: A Gradient-Aware Search Algorithm for Constrained Markov Decision Processes
Authors: Sami Khairy, Prasanna Balaprakash, Lin X. Cai
Abstract summary: We first prove that the optimization objective in the dual linear program of a finite CMDP is a piece-wise linear convex function with respect to the Lagrange penalty multipliers. We propose a novel two-level Gradient-Aware Search (GAS) algorithm which exploits the PWLC structure to find the optimal state-value function and Lagrange penalty multipliers of a finite CMDP.
Score: 9.728259735794987
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The canonical solution methodology for finite constrained Markov decision processes (CMDPs), where the objective is to maximize the expected infinite-horizon discounted rewards subject to the expected infinite-horizon discounted costs constraints, is based on convex linear programming. In this brief, we first prove that the optimization objective in the dual linear program of a finite CMDP is a piece-wise linear convex function (PWLC) with respect to the Lagrange penalty multipliers. Next, we propose a novel two-level Gradient-Aware Search (GAS) algorithm which exploits the PWLC structure to find the optimal state-value function and Lagrange penalty multipliers of a finite CMDP. The proposed algorithm is applied in two stochastic control problems with constraints: robot navigation in a grid world and solar-powered unmanned aerial vehicle (UAV)-based wireless network management. We empirically compare the convergence performance of the proposed GAS algorithm with binary search (BS), Lagrangian primal-dual optimization (PDO), and Linear Programming (LP). Compared with benchmark algorithms, it is shown that the proposed GAS algorithm converges to the optimal solution faster, does not require hyper-parameter tuning, and is not sensitive to initialization of the Lagrange penalty multiplier.

Related papers

A Learned Proximal Alternating Minimization Algorithm and Its Induced Network for a Class of Two-block Nonconvex and Nonsmooth Optimization [4.975853671529418]
This work proposes a general learned alternating minimization algorithm, LPAM, for solving learnable two-block nonsmooth problems. The proposed LPAM-net is parameter-efficient and has favourable performance in comparison with some state-of-the-art methods.
arXiv Detail & Related papers (2024-11-10T02:02:32Z)
Two-Stage ML-Guided Decision Rules for Sequential Decision Making under Uncertainty [55.06411438416805]
Sequential Decision Making under Uncertainty (SDMU) is ubiquitous in many domains such as energy, finance, and supply chains. Some SDMU are naturally modeled as Multistage Problems (MSPs) but the resulting optimizations are notoriously challenging from a computational standpoint. This paper introduces a novel approach Two-Stage General Decision Rules (TS-GDR) to generalize the policy space beyond linear functions. The effectiveness of TS-GDR is demonstrated through an instantiation using Deep Recurrent Neural Networks named Two-Stage Deep Decision Rules (TS-LDR)
arXiv Detail & Related papers (2024-05-23T18:19:47Z)
Floorplanning of VLSI by Mixed-Variable Optimization [42.82770651937298]
This paper proposes memetic algorithms to solve mixed-variable floorplanning problems. Proposed algorithms are superior to some celebrated B*-tree based floorplanning algorithms.
arXiv Detail & Related papers (2024-01-27T06:34:16Z)
A Near-Optimal Single-Loop Stochastic Algorithm for Convex Finite-Sum Coupled Compositional Optimization [53.14532968909759]
We introduce an efficient single-loop primal-dual block-coordinate algorithm called ALEXR. We establish the convergence rates of ALEXR in both convex and strongly convex cases under smoothness and non-smoothness conditions. Experimental results on GDRO and partial Area Under the ROC Curve for cFCCO demonstrate the promising performance of our algorithm.
arXiv Detail & Related papers (2023-12-04T19:00:07Z)
On Linear Convergence of PI Consensus Algorithm under the Restricted Secant Inequality [5.35599092568615]
This paper considers solving distributed optimization problems in peer-to-peer multi-agent networks. By using the proportional-integral (PI) control strategy, various algorithms with fixed stepsize have been developed.
arXiv Detail & Related papers (2023-09-30T15:54:52Z)
Accelerating Cutting-Plane Algorithms via Reinforcement Learning Surrogates [49.84541884653309]
A current standard approach to solving convex discrete optimization problems is the use of cutting-plane algorithms. Despite the existence of a number of general-purpose cut-generating algorithms, large-scale discrete optimization problems continue to suffer from intractability. We propose a method for accelerating cutting-plane algorithms via reinforcement learning.
arXiv Detail & Related papers (2023-07-17T20:11:56Z)
Accelerated First-Order Optimization under Nonlinear Constraints [73.2273449996098]
We exploit between first-order algorithms for constrained optimization and non-smooth systems to design a new class of accelerated first-order algorithms. An important property of these algorithms is that constraints are expressed in terms of velocities instead of sparse variables.
arXiv Detail & Related papers (2023-02-01T08:50:48Z)
Fast Computation of Optimal Transport via Entropy-Regularized Extragradient Methods [75.34939761152587]
Efficient computation of the optimal transport distance between two distributions serves as an algorithm that empowers various applications. This paper develops a scalable first-order optimization-based method that computes optimal transport to within $varepsilon$ additive accuracy.
arXiv Detail & Related papers (2023-01-30T15:46:39Z)
Functional Constrained Optimization for Risk Aversion and Sparsity Control [7.561780884831967]
Risk and sparsity requirements need to be enforced simultaneously in many applications, e.g., in portfolio optimization, assortment planning, and radiation planning. We propose a Level Conditional Gradient (LCG) method, which generates a convex or sparse trajectory for these challenges. We show that the method achieves a level single-set projection of the optimal value an inner conditional approximation (CGO) for solving mini-max sub gradient.
arXiv Detail & Related papers (2022-10-11T02:51:51Z)
Faster Algorithm and Sharper Analysis for Constrained Markov Decision Process [56.55075925645864]
The problem of constrained decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated discounted reward subject to multiple constraints. A new utilities-dual convex approach is proposed with novel integration of three ingredients: regularized policy, dual regularizer, and Nesterov's gradient descent dual. This is the first demonstration that nonconcave CMDP problems can attain the lower bound of $mathcal O (1/epsilon)$ for all complexity optimization subject to convex constraints.
arXiv Detail & Related papers (2021-10-20T02:57:21Z)
Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex Decentralized Optimization Over Time-Varying Networks [79.16773494166644]
We consider the task of minimizing the sum of smooth and strongly convex functions stored in a decentralized manner across the nodes of a communication network. We design two optimal algorithms that attain these lower bounds. We corroborate the theoretical efficiency of these algorithms by performing an experimental comparison with existing state-of-the-art methods.
arXiv Detail & Related papers (2021-06-08T15:54:44Z)
A Stochastic Composite Augmented Lagrangian Method For Reinforcement Learning [9.204659134755795]
We consider the linear programming (LP) formulation for deep reinforcement learning. The augmented Lagrangian method suffers the double-sampling obstacle in solving the LP. A deep parameterized augment Lagrangian method is proposed.
arXiv Detail & Related papers (2021-05-20T13:08:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.