Related papers: Action Mapping for Reinforcement Learning in Continuous Environments with Constraints

Action Mapping for Reinforcement Learning in Continuous Environments with Constraints

URL: http://arxiv.org/abs/2412.04327v1
Date: Thu, 05 Dec 2024 16:42:45 GMT
Title: Action Mapping for Reinforcement Learning in Continuous Environments with Constraints
Authors: Mirco Theile, Lukas Dirnberger, Raphael Trumpp, Marco Caccamo, Alberto L. Sangiovanni-Vincentelli,
Abstract summary: We propose a novel DRL training strategy utilizing action mapping to streamline the learning process.<n>We demonstrate through experiments that action mapping significantly improves training performance in constrained environments.
Score: 4.521631014571241
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep reinforcement learning (DRL) has had success across various domains, but applying it to environments with constraints remains challenging due to poor sample efficiency and slow convergence. Recent literature explored incorporating model knowledge to mitigate these problems, particularly through the use of models that assess the feasibility of proposed actions. However, integrating feasibility models efficiently into DRL pipelines in environments with continuous action spaces is non-trivial. We propose a novel DRL training strategy utilizing action mapping that leverages feasibility models to streamline the learning process. By decoupling the learning of feasible actions from policy optimization, action mapping allows DRL agents to focus on selecting the optimal action from a reduced feasible action set. We demonstrate through experiments that action mapping significantly improves training performance in constrained environments with continuous action spaces, especially with imperfect feasibility models.

Related papers

Fast Adaptation with Behavioral Foundation Models [82.34700481726951]
Unsupervised zero-shot reinforcement learning has emerged as a powerful paradigm for pretraining behavioral foundation models. Despite promising results, zero-shot policies are often suboptimal due to errors induced by the unsupervised training process. We propose fast adaptation strategies that search in the low-dimensional task-embedding space of the pre-trained BFM to rapidly improve the performance of its zero-shot policies.
arXiv Detail & Related papers (2025-04-10T16:14:17Z)
Look Before Leap: Look-Ahead Planning with Uncertainty in Reinforcement Learning [4.902161835372679]
We propose a novel framework for uncertainty-aware policy optimization with model-based exploratory planning. In the policy optimization phase, we leverage an uncertainty-driven exploratory policy to actively collect diverse training samples. Our approach offers flexibility and applicability to tasks with varying state/action spaces and reward structures.
arXiv Detail & Related papers (2025-03-26T01:07:35Z)
Distilling Reinforcement Learning Algorithms for In-Context Model-Based Planning [39.53836535326121]
We propose Distillation for In-Context Planning (DICP), an in-context model-based RL framework where Transformers simultaneously learn environment dynamics and improve policy in-context. Our results show that DICP achieves state-of-the-art performance while requiring significantly fewer environment interactions than baselines.
arXiv Detail & Related papers (2025-02-26T10:16:57Z)
Leveraging Constraint Violation Signals For Action-Constrained Reinforcement Learning [13.332006760984122]
Action-Constrained Reinforcement Learning (ACRL) employs a projection layer after the policy network to correct the action. Recent methods were proposed to train generative models to learn a differentiable mapping between latent variables and feasible actions.
arXiv Detail & Related papers (2025-02-08T12:58:26Z)
Adversarial Moment-Matching Distillation of Large Language Models [3.9160947065896803]
Knowledge distillation (KD) has been shown to be highly effective in guiding a student model with a larger teacher model. We propose an adversarial training algorithm to jointly estimate the moment-matching distance and optimize the student policy to minimize it. Results from both task-agnostic instruction-following experiments and task-specific experiments demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2024-06-05T05:27:29Z)
REACT: Revealing Evolutionary Action Consequence Trajectories for Interpretable Reinforcement Learning [7.889696505137217]
We propose Revealing Evolutionary Action Consequence Trajectories (REACT) to enhance the interpretability of Reinforcement Learning (RL) In contrast to the prevalent practice of RL models based on their optimal behavior learned during training, we posit that considering a range of edge-case trajectories provides a more comprehensive understanding of their inherent behavior. Our results highlight its effectiveness in revealing nuanced aspects of RL models' behavior beyond optimal performance, thereby contributing to improved interpretability.
arXiv Detail & Related papers (2024-04-04T10:56:30Z)
Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data. In this paper, we propose an adaptive scheme for action quantization. We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z)
Hybrid Reinforcement Learning for Optimizing Pump Sustainability in Real-World Water Distribution Networks [55.591662978280894]
This article addresses the pump-scheduling optimization problem to enhance real-time control of real-world water distribution networks (WDNs) Our primary objectives are to adhere to physical operational constraints while reducing energy consumption and operational costs. Traditional optimization techniques, such as evolution-based and genetic algorithms, often fall short due to their lack of convergence guarantees.
arXiv Detail & Related papers (2023-10-13T21:26:16Z)
Latent Variable Representation for Reinforcement Learning [131.03944557979725]
It remains unclear theoretically and empirically how latent variable models may facilitate learning, planning, and exploration to improve the sample efficiency of model-based reinforcement learning. We provide a representation view of the latent variable models for state-action value functions, which allows both tractable variational learning algorithm and effective implementation of the optimism/pessimism principle. In particular, we propose a computationally efficient planning algorithm with UCB exploration by incorporating kernel embeddings of latent variable models.
arXiv Detail & Related papers (2022-12-17T00:26:31Z)
Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning [70.20191211010847]
Offline reinforcement learning (RL) aims to learn an optimal policy using a previously collected static dataset. We introduce Diffusion Q-learning (Diffusion-QL) that utilizes a conditional diffusion model to represent the policy. We show that our method can achieve state-of-the-art performance on the majority of the D4RL benchmark tasks.
arXiv Detail & Related papers (2022-08-12T09:54:11Z)
Online reinforcement learning with sparse rewards through an active inference capsule [62.997667081978825]
This paper introduces an active inference agent which minimizes the novel free energy of the expected future. Our model is capable of solving sparse-reward problems with a very high sample efficiency. We also introduce a novel method for approximating the prior model from the reward function, which simplifies the expression of complex objectives.
arXiv Detail & Related papers (2021-06-04T10:03:36Z)
Model-Based Offline Planning with Trajectory Pruning [15.841609263723575]
offline reinforcement learning (RL) enables learning policies using pre-collected datasets without environment interaction. We propose a new light-weighted model-based offline planning framework, namely MOPP, which tackles the dilemma between the restrictions of offline learning and high-performance planning. Experimental results show that MOPP provides competitive performance compared with existing model-based offline planning and RL approaches.
arXiv Detail & Related papers (2021-05-16T05:00:54Z)
Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension. We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation. These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.