Related papers: Efficient Action-Constrained Reinforcement Learning via Acceptance-Rejection Method and Augmented MDPs

Efficient Action-Constrained Reinforcement Learning via Acceptance-Rejection Method and Augmented MDPs

URL: http://arxiv.org/abs/2503.12932v1
Date: Mon, 17 Mar 2025 08:41:43 GMT
Title: Efficient Action-Constrained Reinforcement Learning via Acceptance-Rejection Method and Augmented MDPs
Authors: Wei Hung, Shao-Hua Sun, Ping-Chun Hsieh,
Abstract summary: Action-constrained reinforcement learning (ACRL) is a generic framework for learning control policies with zero action constraint violation.<n>We propose a generic and computationally efficient framework that can adapt a standard unconstrained RL method to ACRL.<n>We show that the proposed framework enjoys faster training progress, better constraint satisfaction, and a lower action inference time simultaneously than the state-of-the-art ACRL methods.
Score: 13.443196224057658
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Action-constrained reinforcement learning (ACRL) is a generic framework for learning control policies with zero action constraint violation, which is required by various safety-critical and resource-constrained applications. The existing ACRL methods can typically achieve favorable constraint satisfaction but at the cost of either high computational burden incurred by the quadratic programs (QP) or increased architectural complexity due to the use of sophisticated generative models. In this paper, we propose a generic and computationally efficient framework that can adapt a standard unconstrained RL method to ACRL through two modifications: (i) To enforce the action constraints, we leverage the classic acceptance-rejection method, where we treat the unconstrained policy as the proposal distribution and derive a modified policy with feasible actions. (ii) To improve the acceptance rate of the proposal distribution, we construct an augmented two-objective Markov decision process (MDP), which include additional self-loop state transitions and a penalty signal for the rejected actions. This augmented MDP incentives the learned policy to stay close to the feasible action sets. Through extensive experiments in both robot control and resource allocation domains, we demonstrate that the proposed framework enjoys faster training progress, better constraint satisfaction, and a lower action inference time simultaneously than the state-of-the-art ACRL methods. We have made the source code publicly available to encourage further research in this direction.

Related papers

Curriculum Design for Trajectory-Constrained Agent: Compressing Chain-of-Thought Tokens in LLMs [26.165537937650413]
Training agents to operate under strict constraints during deployment presents significant challenges.<n>We propose a curriculum learning strategy that gradually tightens constraints during training, enabling the agent to incrementally master the deployment requirements.
arXiv Detail & Related papers (2025-11-04T16:14:56Z)
Leveraging Constraint Violation Signals For Action-Constrained Reinforcement Learning [13.332006760984122]
Action-Constrained Reinforcement Learning (ACRL) employs a projection layer after the policy network to correct the action.<n>Recent methods were proposed to train generative models to learn a differentiable mapping between latent variables and feasible actions.
arXiv Detail & Related papers (2025-02-08T12:58:26Z)
Behavior-Regularized Diffusion Policy Optimization for Offline Reinforcement Learning [22.333460316347264]
We introduce BDPO, a principled behavior-regularized RL framework tailored for diffusion-based policies.<n>We develop an efficient two-time-scale actor-critic RL algorithm that produces the optimal policy while respecting the behavior constraint.
arXiv Detail & Related papers (2025-02-07T09:30:35Z)
Probabilistic Satisfaction of Temporal Logic Constraints in Reinforcement Learning via Adaptive Policy-Switching [0.0]
Constrained Reinforcement Learning (CRL) is a subset of machine learning that introduces constraints into the traditional reinforcement learning (RL) framework.<n>We propose a novel framework that relies on switching between pure learning (reward) and constraint satisfaction.
arXiv Detail & Related papers (2024-10-10T15:19:45Z)
Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning [62.81324245896717]
We introduce an exploration-agnostic algorithm, called C-PG, which exhibits global last-ite convergence guarantees under (weak) gradient domination assumptions. We numerically validate our algorithms on constrained control problems, and compare them with state-of-the-art baselines.
arXiv Detail & Related papers (2024-07-15T14:54:57Z)
Diffusion-based Reinforcement Learning via Q-weighted Variational Policy Optimization [55.97310586039358]
Diffusion models have garnered widespread attention in Reinforcement Learning (RL) for their powerful expressiveness and multimodality. We propose a novel model-free diffusion-based online RL algorithm, Q-weighted Variational Policy Optimization (QVPO) Specifically, we introduce the Q-weighted variational loss, which can be proved to be a tight lower bound of the policy objective in online RL under certain conditions. We also develop an efficient behavior policy to enhance sample efficiency by reducing the variance of the diffusion policy during online interactions.
arXiv Detail & Related papers (2024-05-25T10:45:46Z)
A Dual Perspective of Reinforcement Learning for Imposing Policy Constraints [0.0]
We use a generic primal-dual framework for value-based and actor-critic reinforcement learning methods.<n>The obtained dual formulations turn out to be especially useful for imposing additional constraints on the learned policy.<n>A practical algorithm is derived that supports various combinations of policy constraints that are automatically handled throughout training.
arXiv Detail & Related papers (2024-04-25T09:50:57Z)
Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning [70.20191211010847]
Offline reinforcement learning (RL) aims to learn an optimal policy using a previously collected static dataset. We introduce Diffusion Q-learning (Diffusion-QL) that utilizes a conditional diffusion model to represent the policy. We show that our method can achieve state-of-the-art performance on the majority of the D4RL benchmark tasks.
arXiv Detail & Related papers (2022-08-12T09:54:11Z)
Towards Deployment-Efficient Reinforcement Learning: Lower Bound and Optimality [141.89413461337324]
Deployment efficiency is an important criterion for many real-world applications of reinforcement learning (RL) We propose a theoretical formulation for deployment-efficient RL (DE-RL) from an "optimization with constraints" perspective.
arXiv Detail & Related papers (2022-02-14T01:31:46Z)
Reinforcement Learning With Sparse-Executing Actions via Sparsity Regularization [15.945378631406024]
Reinforcement learning (RL) has demonstrated impressive performance in decision-making tasks like embodied control, autonomous driving and financial trading. In many decision-making tasks, the agents often encounter the problem of executing actions under limited budgets. This paper formalizes the problem as a Sparse Action Markov Decision Process (SA-MDP), in which specific actions in the action space can only be executed for a limited time. We propose a policy optimization algorithm, Action Sparsity REgularization (ASRE), which adaptively handles each action with a distinct preference.
arXiv Detail & Related papers (2021-05-18T16:50:42Z)
State Augmented Constrained Reinforcement Learning: Overcoming the Limitations of Learning with Rewards [88.30521204048551]
A common formulation of constrained reinforcement learning involves multiple rewards that must individually accumulate to given thresholds. We show a simple example in which the desired optimal policy cannot be induced by any weighted linear combination of rewards. This work addresses this shortcoming by augmenting the state with Lagrange multipliers and reinterpreting primal-dual methods.
arXiv Detail & Related papers (2021-02-23T21:07:35Z)
Escaping from Zero Gradient: Revisiting Action-Constrained Reinforcement Learning via Frank-Wolfe Policy Optimization [5.072893872296332]
Action-constrained reinforcement learning (RL) is a widely-used approach in various real-world applications. We propose a learning algorithm that decouples the action constraints from the policy parameter update. We show that the proposed algorithm significantly outperforms the benchmark methods on a variety of control tasks.
arXiv Detail & Related papers (2021-02-22T14:28:03Z)
Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO) We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.