Neural-Progressive Hedging: Enforcing Constraints in Reinforcement
Learning with Stochastic Programming
- URL: http://arxiv.org/abs/2202.13436v1
- Date: Sun, 27 Feb 2022 19:39:19 GMT
- Title: Neural-Progressive Hedging: Enforcing Constraints in Reinforcement
Learning with Stochastic Programming
- Authors: Supriyo Ghosh, Laura Wynter, Shiau Hong Lim and Duc Thien Nguyen
- Abstract summary: We propose a framework that leverages programming during the online phase of executing a reinforcement learning (RL) policy.
The goal is to ensure feasibility with respect to constraints and risk-based objectives such as conditional value-at-risk (CVaR)
We show that the NP framework produces policies that are better than deep RL and other baseline approaches.
- Score: 8.942831966541231
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a framework, called neural-progressive hedging (NP), that
leverages stochastic programming during the online phase of executing a
reinforcement learning (RL) policy. The goal is to ensure feasibility with
respect to constraints and risk-based objectives such as conditional
value-at-risk (CVaR) during the execution of the policy, using probabilistic
models of the state transitions to guide policy adjustments. The framework is
particularly amenable to the class of sequential resource allocation problems
since feasibility with respect to typical resource constraints cannot be
enforced in a scalable manner. The NP framework provides an alternative that
adds modest overhead during the online phase. Experimental results demonstrate
the efficacy of the NP framework on two continuous real-world tasks: (i) the
portfolio optimization problem with liquidity constraints for financial
planning, characterized by non-stationary state distributions; and (ii) the
dynamic repositioning problem in bike sharing systems, that embodies the class
of supply-demand matching problems. We show that the NP framework produces
policies that are better than deep RL and other baseline approaches, adapting
to non-stationarity, whilst satisfying structural constraints and accommodating
risk measures in the resulting policies. Additional benefits of the NP
framework are ease of implementation and better explainability of the policies.
Related papers
- Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning [62.81324245896717]
We introduce an exploration-agnostic algorithm, called C-PG, which exhibits global last-ite convergence guarantees under (weak) gradient domination assumptions.
We numerically validate our algorithms on constrained control problems, and compare them with state-of-the-art baselines.
arXiv Detail & Related papers (2024-07-15T14:54:57Z) - Constraint-Generation Policy Optimization (CGPO): Nonlinear Programming
for Policy Optimization in Mixed Discrete-Continuous MDPs [23.87856533426793]
CGPO provides bounded policy error guarantees over an infinite range of initial states for many DC-MDPs with expressive nonlinear dynamics.
CGPO can generate worst-case state trajectories to diagnose policy deficiencies and provide counterfactual explanations of optimal actions.
We experimentally demonstrate the applicability of CGPO in diverse domains, including inventory control, management of a system of water reservoirs.
arXiv Detail & Related papers (2024-01-20T07:12:57Z) - Compositional Policy Learning in Stochastic Control Systems with Formal
Guarantees [0.0]
Reinforcement learning has shown promising results in learning neural network policies for complicated control tasks.
We propose a novel method for learning a composition of neural network policies in environments.
A formal certificate guarantees that a specification over the policy's behavior is satisfied with the desired probability.
arXiv Detail & Related papers (2023-12-03T17:04:18Z) - Probabilistic Reach-Avoid for Bayesian Neural Networks [71.67052234622781]
We show that an optimal synthesis algorithm can provide more than a four-fold increase in the number of certifiable states.
The algorithm is able to provide more than a three-fold increase in the average guaranteed reach-avoid probability.
arXiv Detail & Related papers (2023-10-03T10:52:21Z) - When Demonstrations Meet Generative World Models: A Maximum Likelihood
Framework for Offline Inverse Reinforcement Learning [62.00672284480755]
This paper aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent.
Accurate models of expertise in executing a task has applications in safety-sensitive applications such as clinical decision making and autonomous driving.
arXiv Detail & Related papers (2023-02-15T04:14:20Z) - Stability Verification of Neural Network Controllers using Mixed-Integer
Programming [5.811502603310248]
We propose a framework for the stability verification of representable control policies.
The proposed framework is sufficiently general to accommodate a broad range of candidate policies.
We present an open-source toolbox in Python based on the proposed framework.
arXiv Detail & Related papers (2022-06-27T15:34:39Z) - COptiDICE: Offline Constrained Reinforcement Learning via Stationary
Distribution Correction Estimation [73.17078343706909]
offline constrained reinforcement learning (RL) problem, in which the agent aims to compute a policy that maximizes expected return while satisfying given cost constraints, learning only from a pre-collected dataset.
We present an offline constrained RL algorithm that optimize the policy in the space of the stationary distribution.
Our algorithm, COptiDICE, directly estimates the stationary distribution corrections of the optimal policy with respect to returns, while constraining the cost upper bound, with the goal of yielding a cost-conservative policy for actual constraint satisfaction.
arXiv Detail & Related papers (2022-04-19T15:55:47Z) - A Regularized Implicit Policy for Offline Reinforcement Learning [54.7427227775581]
offline reinforcement learning enables learning from a fixed dataset, without further interactions with the environment.
We propose a framework that supports learning a flexible yet well-regularized fully-implicit policy.
Experiments and ablation study on the D4RL dataset validate our framework and the effectiveness of our algorithmic designs.
arXiv Detail & Related papers (2022-02-19T20:22:04Z) - Supported Policy Optimization for Offline Reinforcement Learning [74.1011309005488]
Policy constraint methods to offline reinforcement learning (RL) typically utilize parameterization or regularization.
Regularization methods reduce the divergence between the learned policy and the behavior policy.
This paper presents Supported Policy OpTimization (SPOT), which is directly derived from the theoretical formalization of the density-based support constraint.
arXiv Detail & Related papers (2022-02-13T07:38:36Z) - DNN-based Policies for Stochastic AC OPF [7.551130027327462]
optimal power flow (SOPF) formulations provide a mechanism to handle uncertainties by computing dispatch decisions and control policies that maintain feasibility under uncertainty.
We put forth a deep neural network (DNN)-based policy that predicts the generator dispatch decisions in response to uncertainty.
The advantages of the DNN policy over simpler policies and their efficacy in enforcing safety limits and producing near optimal solutions are demonstrated.
arXiv Detail & Related papers (2021-12-04T22:26:27Z) - An Offline Risk-aware Policy Selection Method for Bayesian Markov
Decision Processes [0.0]
Exploitation vs Caution (EvC) is a paradigm that elegantly incorporates model uncertainty abiding by the Bayesian formalism.
We validate EvC with state-of-the-art approaches in different discrete, yet simple, environments offering a fair variety of MDP classes.
In the tested scenarios EvC manages to select robust policies and hence stands out as a useful tool for practitioners.
arXiv Detail & Related papers (2021-05-27T20:12:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.