A Dynamic Penalty Function Approach for Constraints-Handling in
Reinforcement Learning
- URL: http://arxiv.org/abs/2012.11790v2
- Date: Wed, 31 Mar 2021 06:00:15 GMT
- Title: A Dynamic Penalty Function Approach for Constraints-Handling in
Reinforcement Learning
- Authors: Haeun Yoo, Victor M. Zavala, Jay H. Lee
- Abstract summary: This study focuses on usingReinforcement learning (RL) to solve constrained optimal control problems.
While training neural networks to learn the value (or Q) function, one can run into computational issues caused by the sharp change in the function value at the constraint boundary.
This difficulty during training can lead to convergence problems and ultimately lead to poor closed-loop performance.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Reinforcement learning (RL) is attracting attention as an effective way to
solve sequential optimization problems that involve high dimensional
state/action space and stochastic uncertainties. Many such problems involve
constraints expressed by inequality constraints. This study focuses on using RL
to solve constrained optimal control problems. Most RL application studies have
dealt with inequality constraints by adding soft penalty terms for violating
the constraints to the reward function. However, while training neural networks
to learn the value (or Q) function, one can run into computational issues
caused by the sharp change in the function value at the constraint boundary due
to the large penalty imposed. This difficulty during training can lead to
convergence problems and ultimately lead to poor closed-loop performance. To
address this issue, this study proposes a dynamic penalty (DP) approach where
the penalty factor is gradually and systematically increased during training as
the iteration episodes proceed. We first examine the ability of a neural
network to represent a value function when uniform, linear, or DP functions are
added to prevent constraint violation. The agent trained by a Deep Q Network
(DQN) algorithm with the DP function approach was compared with agents with
other constant penalty functions in a simple vehicle control problem. Results
show that the proposed approach can improve the neural network approximation
accuracy and provide faster convergence when close to a solution.
Related papers
- A Penalty-Based Guardrail Algorithm for Non-Decreasing Optimization with Inequality Constraints [1.5498250598583487]
Traditional mathematical programming solvers require long computational times to solve constrained minimization problems.
We propose a penalty-based guardrail algorithm (PGA) to efficiently solve them.
arXiv Detail & Related papers (2024-05-03T10:37:34Z) - Constrained Reinforcement Learning with Smoothed Log Barrier Function [27.216122901635018]
We propose a new constrained RL method called CSAC-LB (Constrained Soft Actor-Critic with Log Barrier Function)
It achieves competitive performance without any pre-training by applying a linear smoothed log barrier function to an additional safety critic.
We show that with CSAC-LB, we achieve state-of-the-art performance on several constrained control tasks with different levels of difficulty.
arXiv Detail & Related papers (2024-03-21T16:02:52Z) - Robust Stochastically-Descending Unrolled Networks [85.6993263983062]
Deep unrolling is an emerging learning-to-optimize method that unrolls a truncated iterative algorithm in the layers of a trainable neural network.
We show that convergence guarantees and generalizability of the unrolled networks are still open theoretical problems.
We numerically assess unrolled architectures trained under the proposed constraints in two different applications.
arXiv Detail & Related papers (2023-12-25T18:51:23Z) - Implicit Stochastic Gradient Descent for Training Physics-informed
Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems.
PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features.
In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z) - Learning to Optimize with Stochastic Dominance Constraints [103.26714928625582]
In this paper, we develop a simple yet efficient approach for the problem of comparing uncertain quantities.
We recast inner optimization in the Lagrangian as a learning problem for surrogate approximation, which bypasses apparent intractability.
The proposed light-SD demonstrates superior performance on several representative problems ranging from finance to supply chain management.
arXiv Detail & Related papers (2022-11-14T21:54:31Z) - Adaptive Self-supervision Algorithms for Physics-informed Neural
Networks [59.822151945132525]
Physics-informed neural networks (PINNs) incorporate physical knowledge from the problem domain as a soft constraint on the loss function.
We study the impact of the location of the collocation points on the trainability of these models.
We propose a novel adaptive collocation scheme which progressively allocates more collocation points to areas where the model is making higher errors.
arXiv Detail & Related papers (2022-07-08T18:17:06Z) - Deep Unsupervised Learning for Generalized Assignment Problems: A
Case-Study of User-Association in Wireless Networks [11.42707683459227]
We propose a novel deep unsupervised learning (DUL) approach to solve the generalized assignment problems (GAP) in a time-efficient manner.
In particular, we propose a new approach that facilitates to train a deep neural network (DNN) using a customized loss function.
Numerical results demonstrate that the proposed DUL approach provides near-optimal results with significantly lower time-complexity.
arXiv Detail & Related papers (2021-03-26T16:07:02Z) - Exact Asymptotics for Linear Quadratic Adaptive Control [6.287145010885044]
We study the simplest non-bandit reinforcement learning problem: linear quadratic control (LQAC)
We derive expressions for the regret, estimation error, and prediction error of a stepwise-updating LQAC algorithm.
In simulations on both stable and unstable systems, we find that our theory also describes the algorithm's finite-sample behavior remarkably well.
arXiv Detail & Related papers (2020-11-02T22:43:30Z) - Chance-Constrained Control with Lexicographic Deep Reinforcement
Learning [77.34726150561087]
This paper proposes a lexicographic Deep Reinforcement Learning (DeepRL)-based approach to chance-constrained Markov Decision Processes.
A lexicographic version of the well-known DeepRL algorithm DQN is also proposed and validated via simulations.
arXiv Detail & Related papers (2020-10-19T13:09:14Z) - Combining Deep Learning and Optimization for Security-Constrained
Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems.
Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs.
This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv Detail & Related papers (2020-07-14T12:38:21Z) - Unsupervised Deep Learning for Optimizing Wireless Systems with
Instantaneous and Statistic Constraints [29.823814915538463]
We establish a unified framework of using unsupervised deep learning to solve both kinds of problems with both instantaneous and statistic constraints.
We show that unsupervised learning outperforms supervised learning in terms of violation probability and approximation accuracy of the optimal policy.
arXiv Detail & Related papers (2020-05-30T13:37:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.