HyAR: Addressing Discrete-Continuous Action Reinforcement Learning via
Hybrid Action Representation
- URL: http://arxiv.org/abs/2109.05490v1
- Date: Sun, 12 Sep 2021 11:26:27 GMT
- Title: HyAR: Addressing Discrete-Continuous Action Reinforcement Learning via
Hybrid Action Representation
- Authors: Boyan Li, Hongyao Tang, Yan Zheng, Jianye Hao, Pengyi Li, Zhen Wang,
Zhaopeng Meng, Li Wang
- Abstract summary: Most previous Reinforcement Learning (RL) works only demonstrate the success in controlling with either discrete or continuous action space.
We propose Hybrid Action Representation (HyAR) to learn a compact and decodable latent representation space for the original hybrid action space.
We evaluate HyAR in a variety of environments with discrete-continuous action space.
- Score: 30.621472051415857
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Discrete-continuous hybrid action space is a natural setting in many
practical problems, such as robot control and game AI. However, most previous
Reinforcement Learning (RL) works only demonstrate the success in controlling
with either discrete or continuous action space, while seldom take into account
the hybrid action space. One naive way to address hybrid action RL is to
convert the hybrid action space into a unified homogeneous action space by
discretization or continualization, so that conventional RL algorithms can be
applied. However, this ignores the underlying structure of hybrid action space
and also induces the scalability issue and additional approximation
difficulties, thus leading to degenerated results. In this paper, we propose
Hybrid Action Representation (HyAR) to learn a compact and decodable latent
representation space for the original hybrid action space. HyAR constructs the
latent space and embeds the dependence between discrete action and continuous
parameter via an embedding table and conditional Variantional Auto-Encoder
(VAE). To further improve the effectiveness, the action representation is
trained to be semantically smooth through unsupervised environmental dynamics
prediction. Finally, the agent then learns its policy with conventional DRL
algorithms in the learned representation space and interacts with the
environment by decoding the hybrid action embeddings to the original action
space. We evaluate HyAR in a variety of environments with discrete-continuous
action space. The results demonstrate the superiority of HyAR when compared
with previous baselines, especially for high-dimensional action spaces.
Related papers
- Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - AI planning in the imagination: High-level planning on learned abstract
search spaces [68.75684174531962]
We propose a new method, called PiZero, that gives an agent the ability to plan in an abstract search space that the agent learns during training.
We evaluate our method on multiple domains, including the traveling salesman problem, Sokoban, 2048, the facility location problem, and Pacman.
arXiv Detail & Related papers (2023-08-16T22:47:16Z) - Adaptive Discretization using Voronoi Trees for Continuous POMDPs [7.713622698801596]
We propose a new sampling-based online POMDP solver, called Adaptive Discretization using Voronoi Trees (ADVT)
It uses Monte Carlo Tree Search in combination with an adaptive discretization of the action space as well as optimistic optimization to efficiently sample high-dimensional continuous action spaces.
ADVT scales substantially better to high-dimensional continuous action spaces, compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-02-21T04:47:34Z) - Generative Slate Recommendation with Reinforcement Learning [49.75985313698214]
reinforcement learning algorithms can be used to optimize user engagement in recommender systems.
However, RL approaches are intractable in the slate recommendation scenario.
In that setting, an action corresponds to a slate that may contain any combination of items.
In this work we propose to encode slates in a continuous, low-dimensional latent space learned by a variational auto-encoder.
We are able to (i) relax assumptions required by previous work, and (ii) improve the quality of the action selection by modeling full slates.
arXiv Detail & Related papers (2023-01-20T15:28:09Z) - Adaptive Discretization using Voronoi Trees for Continuous-Action POMDPs [7.713622698801596]
We propose a new sampling-based online POMDP solver, called Adaptive Discretization using Voronoi Trees (ADVT)
ADVT uses Monte Carlo Tree Search in combination with an adaptive discretization of the action space as well as optimistic optimization.
Experiments on simulations of four types of benchmark problems indicate that ADVT outperforms and scales substantially better to high-dimensional continuous action spaces.
arXiv Detail & Related papers (2022-09-13T05:04:49Z) - Deep Multi-Agent Reinforcement Learning with Hybrid Action Spaces based
on Maximum Entropy [0.0]
We propose Deep Multi-Agent Hybrid Soft Actor-Critic (MAHSAC) to handle multi-agent problems with hybrid action spaces.
This algorithm follows the centralized training but decentralized execution (CTDE) paradigm, and extend the Soft Actor-Critic algorithm (SAC) to handle hybrid action space problems.
Our experiences are running on an easy multi-agent particle world with a continuous observation and discrete action space, along with some basic simulated physics.
arXiv Detail & Related papers (2022-06-10T13:52:59Z) - OSCAR: Data-Driven Operational Space Control for Adaptive and Robust
Robot Manipulation [50.59541802645156]
Operational Space Control (OSC) has been used as an effective task-space controller for manipulation.
We propose OSC for Adaptation and Robustness (OSCAR), a data-driven variant of OSC that compensates for modeling errors.
We evaluate our method on a variety of simulated manipulation problems, and find substantial improvements over an array of controller baselines.
arXiv Detail & Related papers (2021-10-02T01:21:38Z) - Generalising Discrete Action Spaces with Conditional Action Trees [0.0]
We introduce em Conditional Action Trees with two main objectives.
We show several proof-of-concept experiments ranging from environments with discrete action spaces to those with large action spaces commonly found in RTS-style games.
arXiv Detail & Related papers (2021-04-15T08:10:18Z) - ReLMoGen: Leveraging Motion Generation in Reinforcement Learning for
Mobile Manipulation [99.2543521972137]
ReLMoGen is a framework that combines a learned policy to predict subgoals and a motion generator to plan and execute the motion needed to reach these subgoals.
Our method is benchmarked on a diverse set of seven robotics tasks in photo-realistic simulation environments.
ReLMoGen shows outstanding transferability between different motion generators at test time, indicating a great potential to transfer to real robots.
arXiv Detail & Related papers (2020-08-18T08:05:15Z) - Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension.
We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation.
These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z) - Continuous-Discrete Reinforcement Learning for Hybrid Control in
Robotics [21.823173895315605]
We propose to treat hybrid problems in their 'native' form by solving them with hybrid reinforcement learning.
In our experiments, we first demonstrate that the proposed approach efficiently solves such hybrid reinforcement learning problems.
We then show, both in simulation and on robotic hardware, the benefits of removing possibly imperfect expert-designeds.
arXiv Detail & Related papers (2020-01-02T14:19:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.