Related papers: Neural-Progressive Hedging: Enforcing Constraints in Reinforcement Learning with Stochastic Programming

Neural-Progressive Hedging: Enforcing Constraints in Reinforcement Learning with Stochastic Programming

URL: http://arxiv.org/abs/2202.13436v1
Date: Sun, 27 Feb 2022 19:39:19 GMT
Title: Neural-Progressive Hedging: Enforcing Constraints in Reinforcement Learning with Stochastic Programming
Authors: Supriyo Ghosh, Laura Wynter, Shiau Hong Lim and Duc Thien Nguyen
Abstract summary: We propose a framework that leverages programming during the online phase of executing a reinforcement learning (RL) policy. The goal is to ensure feasibility with respect to constraints and risk-based objectives such as conditional value-at-risk (CVaR) We show that the NP framework produces policies that are better than deep RL and other baseline approaches.
Score: 8.942831966541231
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a framework, called neural-progressive hedging (NP), that leverages stochastic programming during the online phase of executing a reinforcement learning (RL) policy. The goal is to ensure feasibility with respect to constraints and risk-based objectives such as conditional value-at-risk (CVaR) during the execution of the policy, using probabilistic models of the state transitions to guide policy adjustments. The framework is particularly amenable to the class of sequential resource allocation problems since feasibility with respect to typical resource constraints cannot be enforced in a scalable manner. The NP framework provides an alternative that adds modest overhead during the online phase. Experimental results demonstrate the efficacy of the NP framework on two continuous real-world tasks: (i) the portfolio optimization problem with liquidity constraints for financial planning, characterized by non-stationary state distributions; and (ii) the dynamic repositioning problem in bike sharing systems, that embodies the class of supply-demand matching problems. We show that the NP framework produces policies that are better than deep RL and other baseline approaches, adapting to non-stationarity, whilst satisfying structural constraints and accommodating risk measures in the resulting policies. Additional benefits of the NP framework are ease of implementation and better explainability of the policies.

Related papers

Situational-Constrained Sequential Resources Allocation via Reinforcement Learning [17.8234166913582]
Sequential Resource Allocation with situational constraints presents a significant challenge in real-world applications.<n>This paper introduces a novel framework, SCRL, to address this problem.<n>We develop a new algorithm that dynamically penalizes constraint violations.
arXiv Detail & Related papers (2025-06-17T02:40:49Z)
SPoRt -- Safe Policy Ratio: Certified Training and Deployment of Task Policies in Model-Free RL [54.022106606140774]
We present theoretical results that provide a bound on the probability of violating a safety property for a new task-specific policy in a model-free, episodic setup. We also present SPoRt, which enables the user to trade off safety guarantees in exchange for task-specific performance.
arXiv Detail & Related papers (2025-04-08T19:09:07Z)
Convex Regularization and Convergence of Policy Gradient Flows under Safety Constraints [0.2621434923709917]
This paper studies reinforcement learning in infinite-horizon dynamic decision processes with almost-sure safety constraints. We consider a doubly-regularized RL framework that combines reward and parameter regularization to address these constraints within continuous state-action spaces.
arXiv Detail & Related papers (2024-11-28T15:04:43Z)
Robust Offline Reinforcement Learning with Linearly Structured $f$-Divergence Regularization [10.465789490644031]
We propose a novel framework for robust regularized Markov decision process ($d$-RRMDP) For the offline RL setting, we develop a family of algorithms, Robust Regularized Pessimistic Value Iteration (R2PVI)
arXiv Detail & Related papers (2024-11-27T18:57:03Z)
Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning [62.81324245896717]
We introduce an exploration-agnostic algorithm, called C-PG, which exhibits global last-ite convergence guarantees under (weak) gradient domination assumptions. We numerically validate our algorithms on constrained control problems, and compare them with state-of-the-art baselines.
arXiv Detail & Related papers (2024-07-15T14:54:57Z)
Constraint-Generation Policy Optimization (CGPO): Nonlinear Programming for Policy Optimization in Mixed Discrete-Continuous MDPs [23.87856533426793]
CGPO provides bounded policy error guarantees over an infinite range of initial states for many DC-MDPs with expressive nonlinear dynamics. CGPO can generate worst-case state trajectories to diagnose policy deficiencies and provide counterfactual explanations of optimal actions. We experimentally demonstrate the applicability of CGPO in diverse domains, including inventory control, management of a system of water reservoirs.
arXiv Detail & Related papers (2024-01-20T07:12:57Z)
Compositional Policy Learning in Stochastic Control Systems with Formal Guarantees [0.0]
Reinforcement learning has shown promising results in learning neural network policies for complicated control tasks. We propose a novel method for learning a composition of neural network policies in environments. A formal certificate guarantees that a specification over the policy's behavior is satisfied with the desired probability.
arXiv Detail & Related papers (2023-12-03T17:04:18Z)
Probabilistic Reach-Avoid for Bayesian Neural Networks [71.67052234622781]
We show that an optimal synthesis algorithm can provide more than a four-fold increase in the number of certifiable states. The algorithm is able to provide more than a three-fold increase in the average guaranteed reach-avoid probability.
arXiv Detail & Related papers (2023-10-03T10:52:21Z)
When Demonstrations Meet Generative World Models: A Maximum Likelihood Framework for Offline Inverse Reinforcement Learning [62.00672284480755]
This paper aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent. Accurate models of expertise in executing a task has applications in safety-sensitive applications such as clinical decision making and autonomous driving.
arXiv Detail & Related papers (2023-02-15T04:14:20Z)
Stability Verification of Neural Network Controllers using Mixed-Integer Programming [5.811502603310248]
We propose a framework for the stability verification of representable control policies. The proposed framework is sufficiently general to accommodate a broad range of candidate policies. We present an open-source toolbox in Python based on the proposed framework.
arXiv Detail & Related papers (2022-06-27T15:34:39Z)
COptiDICE: Offline Constrained Reinforcement Learning via Stationary Distribution Correction Estimation [73.17078343706909]
offline constrained reinforcement learning (RL) problem, in which the agent aims to compute a policy that maximizes expected return while satisfying given cost constraints, learning only from a pre-collected dataset. We present an offline constrained RL algorithm that optimize the policy in the space of the stationary distribution. Our algorithm, COptiDICE, directly estimates the stationary distribution corrections of the optimal policy with respect to returns, while constraining the cost upper bound, with the goal of yielding a cost-conservative policy for actual constraint satisfaction.
arXiv Detail & Related papers (2022-04-19T15:55:47Z)
A Regularized Implicit Policy for Offline Reinforcement Learning [54.7427227775581]
offline reinforcement learning enables learning from a fixed dataset, without further interactions with the environment. We propose a framework that supports learning a flexible yet well-regularized fully-implicit policy. Experiments and ablation study on the D4RL dataset validate our framework and the effectiveness of our algorithmic designs.
arXiv Detail & Related papers (2022-02-19T20:22:04Z)
Supported Policy Optimization for Offline Reinforcement Learning [74.1011309005488]
Policy constraint methods to offline reinforcement learning (RL) typically utilize parameterization or regularization. Regularization methods reduce the divergence between the learned policy and the behavior policy. This paper presents Supported Policy OpTimization (SPOT), which is directly derived from the theoretical formalization of the density-based support constraint.
arXiv Detail & Related papers (2022-02-13T07:38:36Z)
DNN-based Policies for Stochastic AC OPF [7.551130027327462]
optimal power flow (SOPF) formulations provide a mechanism to handle uncertainties by computing dispatch decisions and control policies that maintain feasibility under uncertainty. We put forth a deep neural network (DNN)-based policy that predicts the generator dispatch decisions in response to uncertainty. The advantages of the DNN policy over simpler policies and their efficacy in enforcing safety limits and producing near optimal solutions are demonstrated.
arXiv Detail & Related papers (2021-12-04T22:26:27Z)
An Offline Risk-aware Policy Selection Method for Bayesian Markov Decision Processes [0.0]
Exploitation vs Caution (EvC) is a paradigm that elegantly incorporates model uncertainty abiding by the Bayesian formalism. We validate EvC with state-of-the-art approaches in different discrete, yet simple, environments offering a fair variety of MDP classes. In the tested scenarios EvC manages to select robust policies and hence stands out as a useful tool for practitioners.
arXiv Detail & Related papers (2021-05-27T20:12:20Z)
Deep Reinforcement Learning with Robust and Smooth Policy [90.78795857181727]
We propose to learn a smooth policy that behaves smoothly with respect to states. We develop a new framework -- textbfSmooth textbfRegularized textbfReinforcement textbfLearning ($textbfSR2textbfL$), where the policy is trained with smoothness-inducing regularization. Such regularization effectively constrains the search space, and enforces smoothness in the learned policy.
arXiv Detail & Related papers (2020-03-21T00:10:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.