Continuous Control with Action Quantization from Demonstrations
- URL: http://arxiv.org/abs/2110.10149v1
- Date: Tue, 19 Oct 2021 17:59:04 GMT
- Title: Continuous Control with Action Quantization from Demonstrations
- Authors: Robert Dadashi, L\'eonard Hussenot, Damien Vincent, Sertan Girgin,
Anton Raichuk, Matthieu Geist, Olivier Pietquin
- Abstract summary: In Reinforcement Learning (RL), discrete actions, as opposed to continuous actions, result in less complex exploration problems.
We propose a novel method: Action Quantization from Demonstrations (AQuaDem) to learn a discretization of continuous action spaces.
We evaluate the proposed method on three different setups: RL with demonstrations, RL with play data --demonstrations of a human playing in an environment but not solving any specific task-- and Imitation Learning.
- Score: 35.44893918778709
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In Reinforcement Learning (RL), discrete actions, as opposed to continuous
actions, result in less complex exploration problems and the immediate
computation of the maximum of the action-value function which is central to
dynamic programming-based methods. In this paper, we propose a novel method:
Action Quantization from Demonstrations (AQuaDem) to learn a discretization of
continuous action spaces by leveraging the priors of demonstrations. This
dramatically reduces the exploration problem, since the actions faced by the
agent not only are in a finite number but also are plausible in light of the
demonstrator's behavior. By discretizing the action space we can apply any
discrete action deep RL algorithm to the continuous control problem. We
evaluate the proposed method on three different setups: RL with demonstrations,
RL with play data --demonstrations of a human playing in an environment but not
solving any specific task-- and Imitation Learning. For all three setups, we
only consider human data, which is more challenging than synthetic data. We
found that AQuaDem consistently outperforms state-of-the-art continuous control
methods, both in terms of performance and sample efficiency. We provide
visualizations and videos in the paper's website:
https://google-research.github.io/aquadem.
Related papers
- Provably Efficient Action-Manipulation Attack Against Continuous Reinforcement Learning [49.48615590763914]
We propose a black-box attack algorithm named LCBT, which uses the Monte Carlo tree search method for efficient action searching and manipulation.
We conduct our proposed attack methods on three aggressive algorithms: DDPG, PPO, and TD3 in continuous settings, which show a promising attack performance.
arXiv Detail & Related papers (2024-11-20T08:20:29Z) - Reinforcement Learning with Action Sequence for Data-Efficient Robot Learning [62.3886343725955]
We introduce a novel RL algorithm that learns a critic network that outputs Q-values over a sequence of actions.
By explicitly training the value functions to learn the consequence of executing a series of current and future actions, our algorithm allows for learning useful value functions from noisy trajectories.
arXiv Detail & Related papers (2024-11-19T01:23:52Z) - Bidirectional Decoding: Improving Action Chunking via Closed-Loop Resampling [51.38330727868982]
Bidirectional Decoding (BID) is a test-time inference algorithm that bridges action chunking with closed-loop operations.
We show that BID boosts the performance of two state-of-the-art generative policies across seven simulation benchmarks and two real-world tasks.
arXiv Detail & Related papers (2024-08-30T15:39:34Z) - A Dual Approach to Imitation Learning from Observations with Offline Datasets [19.856363985916644]
Demonstrations are an effective alternative to task specification for learning agents in settings where designing a reward function is difficult.
We derive DILO, an algorithm that can leverage arbitrary suboptimal data to learn imitating policies without requiring expert actions.
arXiv Detail & Related papers (2024-06-13T04:39:42Z) - Value function estimation using conditional diffusion models for control [62.27184818047923]
We propose a simple algorithm called Diffused Value Function (DVF)
It learns a joint multi-step model of the environment-robot interaction dynamics using a diffusion model.
We show how DVF can be used to efficiently capture the state visitation measure for multiple controllers.
arXiv Detail & Related papers (2023-06-09T18:40:55Z) - Latent Exploration for Reinforcement Learning [87.42776741119653]
In Reinforcement Learning, agents learn policies by exploring and interacting with the environment.
We propose LATent TIme-Correlated Exploration (Lattice), a method to inject temporally-correlated noise into the latent state of the policy network.
arXiv Detail & Related papers (2023-05-31T17:40:43Z) - A Memory-Related Multi-Task Method Based on Task-Agnostic Exploration [26.17597857264231]
In contrast to imitation learning, there is no expert data, only the data collected through environmental exploration.
Since the action sequence to solve the new task may be the combination of trajectory segments of multiple training tasks, the test task and the solving strategy do not exist directly in the training data.
We propose a Memory-related Multi-task Method (M3) to address this problem.
arXiv Detail & Related papers (2022-09-09T03:02:49Z) - Reinforcement Learning with Sparse Rewards using Guidance from Offline
Demonstration [9.017416068706579]
A major challenge in real-world reinforcement learning (RL) is the sparsity of reward feedback.
We develop an algorithm that exploits the offline demonstration data generated by a sub-optimal behavior policy.
We demonstrate the superior performance of our algorithm over state-of-the-art approaches.
arXiv Detail & Related papers (2022-02-09T18:45:40Z) - Learning Memory-Dependent Continuous Control from Demonstrations [13.063093054280948]
This paper builds on the idea of replaying demonstrations for memory-dependent continuous control.
Experiments involving several memory-crucial continuous control tasks reveal significantly reduce interactions with the environment.
The algorithm also shows better sample efficiency and learning capabilities than a baseline reinforcement learning algorithm for memory-based control from demonstrations.
arXiv Detail & Related papers (2021-02-18T08:13:42Z) - Forgetful Experience Replay in Hierarchical Reinforcement Learning from
Demonstrations [55.41644538483948]
In this paper, we propose a combination of approaches that allow the agent to use low-quality demonstrations in complex vision-based environments.
Our proposed goal-oriented structuring of replay buffer allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations.
The solution based on our algorithm beats all the solutions for the famous MineRL competition and allows the agent to mine a diamond in the Minecraft environment.
arXiv Detail & Related papers (2020-06-17T15:38:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.