Related papers: HyAR: Addressing Discrete-Continuous Action Reinforcement Learning via Hybrid Action Representation

HyAR: Addressing Discrete-Continuous Action Reinforcement Learning via Hybrid Action Representation

URL: http://arxiv.org/abs/2109.05490v1
Date: Sun, 12 Sep 2021 11:26:27 GMT
Title: HyAR: Addressing Discrete-Continuous Action Reinforcement Learning via Hybrid Action Representation
Authors: Boyan Li, Hongyao Tang, Yan Zheng, Jianye Hao, Pengyi Li, Zhen Wang, Zhaopeng Meng, Li Wang
Abstract summary: Most previous Reinforcement Learning (RL) works only demonstrate the success in controlling with either discrete or continuous action space. We propose Hybrid Action Representation (HyAR) to learn a compact and decodable latent representation space for the original hybrid action space. We evaluate HyAR in a variety of environments with discrete-continuous action space.
Score: 30.621472051415857
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Discrete-continuous hybrid action space is a natural setting in many practical problems, such as robot control and game AI. However, most previous Reinforcement Learning (RL) works only demonstrate the success in controlling with either discrete or continuous action space, while seldom take into account the hybrid action space. One naive way to address hybrid action RL is to convert the hybrid action space into a unified homogeneous action space by discretization or continualization, so that conventional RL algorithms can be applied. However, this ignores the underlying structure of hybrid action space and also induces the scalability issue and additional approximation difficulties, thus leading to degenerated results. In this paper, we propose Hybrid Action Representation (HyAR) to learn a compact and decodable latent representation space for the original hybrid action space. HyAR constructs the latent space and embeds the dependence between discrete action and continuous parameter via an embedding table and conditional Variantional Auto-Encoder (VAE). To further improve the effectiveness, the action representation is trained to be semantically smooth through unsupervised environmental dynamics prediction. Finally, the agent then learns its policy with conventional DRL algorithms in the learned representation space and interacts with the environment by decoding the hybrid action embeddings to the original action space. We evaluate HyAR in a variety of environments with discrete-continuous action space. The results demonstrate the superiority of HyAR when compared with previous baselines, especially for high-dimensional action spaces.

Related papers

Action Space Reduction Strategies for Reinforcement Learning in Autonomous Driving [0.0]
Reinforcement Learning (RL) offers a promising framework for autonomous driving.<n>Large and high-dimensional action spaces often used to support fine-grained control can impede training efficiency and increase exploration costs.<n>We introduce and evaluate two novel structured action space modification strategies for RL in autonomous driving.
arXiv Detail & Related papers (2025-07-07T17:58:08Z)
IntelliLung: Advancing Safe Mechanical Ventilation using Offline RL with Hybrid Actions and Clinically Aligned Rewards [6.42339118295049]
Invasive mechanical ventilation (MV) is a life-sustaining therapy for critically ill patients in the intensive care unit (ICU)<n>Current stateof-the-art (SOTA) methods struggle with the hybrid (continuous and discrete) nature of MV actions.<n>We propose optimizations that build upon prior work in action space reduction to address the challenges of discrete action spaces.
arXiv Detail & Related papers (2025-06-17T10:17:26Z)
ARIA: Training Language Agents with Intention-Driven Reward Aggregation [58.094583980405446]
We propose ARIA, a method that Aggregates Rewards in Intention space to enable efficient and effective language Agents training.<n>ARIA aims to project natural language actions from the high-dimensional joint token distribution space into a low-dimensional intention space, where semantically similar actions are clustered and assigned shared rewards.<n>Extensive experiments demonstrate that ARIA not only significantly reduces policy gradient variance, but also delivers substantial performance gains of an average of 9.95% across four downstream tasks, consistently outperforming offline and online RL baselines.
arXiv Detail & Related papers (2025-05-31T12:54:49Z)
Imagine Beyond! Distributionally Robust Auto-Encoding for State Space Coverage in Online Reinforcement Learning [16.15673339648566]
Goal-Conditioned Reinforcement Learning (GCRL) enables agents to autonomously acquire diverse behaviors.<n>In the online setting, where agents learn representations while exploring, the latent space evolves with the agent's policy.
arXiv Detail & Related papers (2025-05-23T12:43:55Z)
HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model [54.64088247291416]
A fundamental objective of manipulation policy design is to endow robots to comprehend human instructions, reason about scene cues, and execute generalized actions in dynamic environments.<n>Recent autoregressive vision-language-action (VLA) methods inherit common-sense reasoning capabilities from vision-language models (VLMs) for next action-token prediction.<n>We introduce HybridVLA, a unified framework that absorbs the continuous nature of diffusion-based actions and the contextual reasoning of autoregression.
arXiv Detail & Related papers (2025-03-13T17:59:52Z)
Reinforcement learning with combinatorial actions for coupled restless bandits [62.89013331120493]
We propose SEQUOIA, an RL algorithm that directly optimize for long-term reward over the feasible action space. We empirically validate SEQUOIA on four novel restless bandit problems with constraints: multiple interventions, path constraints, bipartite matching, and capacity constraints.
arXiv Detail & Related papers (2025-03-01T21:25:21Z)
Offline Reinforcement Learning With Combinatorial Action Spaces [12.904199719046968]
Reinforcement learning problems often involve large action spaces arising from the simultaneous execution of multiple sub-actions. We propose Branch Value Estimation (BVE), which effectively captures sub-action dependencies and scales to large spaces by learning to evaluate only a small subset of actions at each timestep. Our experiments show that BVE outperforms state-of-the-art methods across a range of action space sizes.
arXiv Detail & Related papers (2024-10-28T15:49:46Z)
Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data. In this paper, we propose an adaptive scheme for action quantization. We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z)
AI planning in the imagination: High-level planning on learned abstract search spaces [68.75684174531962]
We propose a new method, called PiZero, that gives an agent the ability to plan in an abstract search space that the agent learns during training. We evaluate our method on multiple domains, including the traveling salesman problem, Sokoban, 2048, the facility location problem, and Pacman.
arXiv Detail & Related papers (2023-08-16T22:47:16Z)
Generative Slate Recommendation with Reinforcement Learning [49.75985313698214]
reinforcement learning algorithms can be used to optimize user engagement in recommender systems. However, RL approaches are intractable in the slate recommendation scenario. In that setting, an action corresponds to a slate that may contain any combination of items. In this work we propose to encode slates in a continuous, low-dimensional latent space learned by a variational auto-encoder. We are able to (i) relax assumptions required by previous work, and (ii) improve the quality of the action selection by modeling full slates.
arXiv Detail & Related papers (2023-01-20T15:28:09Z)
Adaptive Discretization using Voronoi Trees for Continuous-Action POMDPs [7.713622698801596]
We propose a new sampling-based online POMDP solver, called Adaptive Discretization using Voronoi Trees (ADVT) ADVT uses Monte Carlo Tree Search in combination with an adaptive discretization of the action space as well as optimistic optimization. Experiments on simulations of four types of benchmark problems indicate that ADVT outperforms and scales substantially better to high-dimensional continuous action spaces.
arXiv Detail & Related papers (2022-09-13T05:04:49Z)
Deep Multi-Agent Reinforcement Learning with Hybrid Action Spaces based on Maximum Entropy [0.0]
We propose Deep Multi-Agent Hybrid Soft Actor-Critic (MAHSAC) to handle multi-agent problems with hybrid action spaces. This algorithm follows the centralized training but decentralized execution (CTDE) paradigm, and extend the Soft Actor-Critic algorithm (SAC) to handle hybrid action space problems. Our experiences are running on an easy multi-agent particle world with a continuous observation and discrete action space, along with some basic simulated physics.
arXiv Detail & Related papers (2022-06-10T13:52:59Z)
OSCAR: Data-Driven Operational Space Control for Adaptive and Robust Robot Manipulation [50.59541802645156]
Operational Space Control (OSC) has been used as an effective task-space controller for manipulation. We propose OSC for Adaptation and Robustness (OSCAR), a data-driven variant of OSC that compensates for modeling errors. We evaluate our method on a variety of simulated manipulation problems, and find substantial improvements over an array of controller baselines.
arXiv Detail & Related papers (2021-10-02T01:21:38Z)
Generalising Discrete Action Spaces with Conditional Action Trees [0.0]
We introduce em Conditional Action Trees with two main objectives. We show several proof-of-concept experiments ranging from environments with discrete action spaces to those with large action spaces commonly found in RTS-style games.
arXiv Detail & Related papers (2021-04-15T08:10:18Z)
ReLMoGen: Leveraging Motion Generation in Reinforcement Learning for Mobile Manipulation [99.2543521972137]
ReLMoGen is a framework that combines a learned policy to predict subgoals and a motion generator to plan and execute the motion needed to reach these subgoals. Our method is benchmarked on a diverse set of seven robotics tasks in photo-realistic simulation environments. ReLMoGen shows outstanding transferability between different motion generators at test time, indicating a great potential to transfer to real robots.
arXiv Detail & Related papers (2020-08-18T08:05:15Z)
Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension. We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation. These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z)
Continuous-Discrete Reinforcement Learning for Hybrid Control in Robotics [21.823173895315605]
We propose to treat hybrid problems in their 'native' form by solving them with hybrid reinforcement learning. In our experiments, we first demonstrate that the proposed approach efficiently solves such hybrid reinforcement learning problems. We then show, both in simulation and on robotic hardware, the benefits of removing possibly imperfect expert-designeds.
arXiv Detail & Related papers (2020-01-02T14:19:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.