FlowPG: Action-constrained Policy Gradient with Normalizing Flows
- URL: http://arxiv.org/abs/2402.05149v1
- Date: Wed, 7 Feb 2024 11:11:46 GMT
- Title: FlowPG: Action-constrained Policy Gradient with Normalizing Flows
- Authors: Janaka Chathuranga Brahmanage, Jiajing Ling, Akshat Kumar
- Abstract summary: Action-constrained reinforcement learning (ACRL) is a popular approach for solving safety-critical resource-alential related decision making problems.
A major challenge in ACRL is to ensure agent taking a valid action satisfying constraints in each step.
- Score: 14.98383953401637
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Action-constrained reinforcement learning (ACRL) is a popular approach for
solving safety-critical and resource-allocation related decision making
problems. A major challenge in ACRL is to ensure agent taking a valid action
satisfying constraints in each RL step. Commonly used approach of using a
projection layer on top of the policy network requires solving an optimization
program which can result in longer training time, slow convergence, and zero
gradient problem. To address this, first we use a normalizing flow model to
learn an invertible, differentiable mapping between the feasible action space
and the support of a simple distribution on a latent variable, such as
Gaussian. Second, learning the flow model requires sampling from the feasible
action space, which is also challenging. We develop multiple methods, based on
Hamiltonian Monte-Carlo and probabilistic sentential decision diagrams for such
action sampling for convex and non-convex constraints. Third, we integrate the
learned normalizing flow with the DDPG algorithm. By design, a well-trained
normalizing flow will transform policy output into a valid action without
requiring an optimization solver. Empirically, our approach results in
significantly fewer constraint violations (upto an order-of-magnitude for
several instances) and is multiple times faster on a variety of continuous
control tasks.
Related papers
- A Simulation-Free Deep Learning Approach to Stochastic Optimal Control [12.699529713351287]
We propose a simulation-free algorithm for the solution of generic problems in optimal control (SOC)
Unlike existing methods, our approach does not require the solution of an adjoint problem.
arXiv Detail & Related papers (2024-10-07T16:16:53Z) - Learning Constrained Optimization with Deep Augmented Lagrangian Methods [54.22290715244502]
A machine learning (ML) model is trained to emulate a constrained optimization solver.
This paper proposes an alternative approach, in which the ML model is trained to predict dual solution estimates directly.
It enables an end-to-end training scheme is which the dual objective is as a loss function, and solution estimates toward primal feasibility, emulating a Dual Ascent method.
arXiv Detail & Related papers (2024-03-06T04:43:22Z) - Generative Modelling of Stochastic Actions with Arbitrary Constraints in
Reinforcement Learning [25.342811509665097]
Many problems in Reinforcement Learning (RL) seek an optimal policy with large discrete multidimensional yet unordered action spaces.
A challenge in this setting is that the underlying action space is categorical (discrete and unordered) and large.
In this work, we address these challenges by applying a (state) conditional normalizing flow to compactly represent the policy.
arXiv Detail & Related papers (2023-11-26T15:57:20Z) - Amortizing intractable inference in large language models [56.92471123778389]
We use amortized Bayesian inference to sample from intractable posterior distributions.
We empirically demonstrate that this distribution-matching paradigm of LLM fine-tuning can serve as an effective alternative to maximum-likelihood training.
As an important application, we interpret chain-of-thought reasoning as a latent variable modeling problem.
arXiv Detail & Related papers (2023-10-06T16:36:08Z) - Model-based Safe Deep Reinforcement Learning via a Constrained Proximal
Policy Optimization Algorithm [4.128216503196621]
We propose an On-policy Model-based Safe Deep RL algorithm in which we learn the transition dynamics of the environment in an online manner.
We show that our algorithm is more sample efficient and results in lower cumulative hazard violations as compared to constrained model-free approaches.
arXiv Detail & Related papers (2022-10-14T06:53:02Z) - Multi-Objective Policy Gradients with Topological Constraints [108.10241442630289]
We present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm.
We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.
arXiv Detail & Related papers (2022-09-15T07:22:58Z) - Deep Learning Approximation of Diffeomorphisms via Linear-Control
Systems [91.3755431537592]
We consider a control system of the form $dot x = sum_i=1lF_i(x)u_i$, with linear dependence in the controls.
We use the corresponding flow to approximate the action of a diffeomorphism on a compact ensemble of points.
arXiv Detail & Related papers (2021-10-24T08:57:46Z) - Learning Sampling Policy for Faster Derivative Free Optimization [100.27518340593284]
We propose a new reinforcement learning based ZO algorithm (ZO-RL) with learning the sampling policy for generating the perturbations in ZO optimization instead of using random sampling.
Our results show that our ZO-RL algorithm can effectively reduce the variances of ZO gradient by learning a sampling policy, and converge faster than existing ZO algorithms in different scenarios.
arXiv Detail & Related papers (2021-04-09T14:50:59Z) - Escaping from Zero Gradient: Revisiting Action-Constrained Reinforcement
Learning via Frank-Wolfe Policy Optimization [5.072893872296332]
Action-constrained reinforcement learning (RL) is a widely-used approach in various real-world applications.
We propose a learning algorithm that decouples the action constraints from the policy parameter update.
We show that the proposed algorithm significantly outperforms the benchmark methods on a variety of control tasks.
arXiv Detail & Related papers (2021-02-22T14:28:03Z) - Combining Deep Learning and Optimization for Security-Constrained
Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems.
Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs.
This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv Detail & Related papers (2020-07-14T12:38:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.