Continuous Control with Action Quantization from Demonstrations
- URL: http://arxiv.org/abs/2110.10149v1
- Date: Tue, 19 Oct 2021 17:59:04 GMT
- Title: Continuous Control with Action Quantization from Demonstrations
- Authors: Robert Dadashi, L\'eonard Hussenot, Damien Vincent, Sertan Girgin,
Anton Raichuk, Matthieu Geist, Olivier Pietquin
- Abstract summary: In Reinforcement Learning (RL), discrete actions, as opposed to continuous actions, result in less complex exploration problems.
We propose a novel method: Action Quantization from Demonstrations (AQuaDem) to learn a discretization of continuous action spaces.
We evaluate the proposed method on three different setups: RL with demonstrations, RL with play data --demonstrations of a human playing in an environment but not solving any specific task-- and Imitation Learning.
- Score: 35.44893918778709
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In Reinforcement Learning (RL), discrete actions, as opposed to continuous
actions, result in less complex exploration problems and the immediate
computation of the maximum of the action-value function which is central to
dynamic programming-based methods. In this paper, we propose a novel method:
Action Quantization from Demonstrations (AQuaDem) to learn a discretization of
continuous action spaces by leveraging the priors of demonstrations. This
dramatically reduces the exploration problem, since the actions faced by the
agent not only are in a finite number but also are plausible in light of the
demonstrator's behavior. By discretizing the action space we can apply any
discrete action deep RL algorithm to the continuous control problem. We
evaluate the proposed method on three different setups: RL with demonstrations,
RL with play data --demonstrations of a human playing in an environment but not
solving any specific task-- and Imitation Learning. For all three setups, we
only consider human data, which is more challenging than synthetic data. We
found that AQuaDem consistently outperforms state-of-the-art continuous control
methods, both in terms of performance and sample efficiency. We provide
visualizations and videos in the paper's website:
https://google-research.github.io/aquadem.
Related papers
- Bidirectional Decoding: Improving Action Chunking via Closed-Loop Resampling [51.38330727868982]
Bidirectional Decoding (BID) is a test-time inference algorithm that bridges action chunking with closed-loop operations.
We show that BID boosts the performance of two state-of-the-art generative policies across seven simulation benchmarks and two real-world tasks.
arXiv Detail & Related papers (2024-08-30T15:39:34Z) - A Dual Approach to Imitation Learning from Observations with Offline Datasets [19.856363985916644]
Demonstrations are an effective alternative to task specification for learning agents in settings where designing a reward function is difficult.
We derive DILO, an algorithm that can leverage arbitrary suboptimal data to learn imitating policies without requiring expert actions.
arXiv Detail & Related papers (2024-06-13T04:39:42Z) - Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - Value function estimation using conditional diffusion models for control [62.27184818047923]
We propose a simple algorithm called Diffused Value Function (DVF)
It learns a joint multi-step model of the environment-robot interaction dynamics using a diffusion model.
We show how DVF can be used to efficiently capture the state visitation measure for multiple controllers.
arXiv Detail & Related papers (2023-06-09T18:40:55Z) - Latent Exploration for Reinforcement Learning [87.42776741119653]
In Reinforcement Learning, agents learn policies by exploring and interacting with the environment.
We propose LATent TIme-Correlated Exploration (Lattice), a method to inject temporally-correlated noise into the latent state of the policy network.
arXiv Detail & Related papers (2023-05-31T17:40:43Z) - MoDem: Accelerating Visual Model-Based Reinforcement Learning with
Demonstrations [36.44386146801296]
Poor sample efficiency continues to be the primary challenge for deployment of deep Reinforcement Learning (RL) algorithms for real-world applications.
We find that leveraging just a handful of demonstrations can dramatically improve the sample-efficiency of model-based RL.
We empirically study three complex visuo-motor control domains and find that our method is 150%-250% more successful in completing sparse reward tasks.
arXiv Detail & Related papers (2022-12-12T04:28:50Z) - A Memory-Related Multi-Task Method Based on Task-Agnostic Exploration [26.17597857264231]
In contrast to imitation learning, there is no expert data, only the data collected through environmental exploration.
Since the action sequence to solve the new task may be the combination of trajectory segments of multiple training tasks, the test task and the solving strategy do not exist directly in the training data.
We propose a Memory-related Multi-task Method (M3) to address this problem.
arXiv Detail & Related papers (2022-09-09T03:02:49Z) - Reinforcement Learning with Sparse Rewards using Guidance from Offline
Demonstration [9.017416068706579]
A major challenge in real-world reinforcement learning (RL) is the sparsity of reward feedback.
We develop an algorithm that exploits the offline demonstration data generated by a sub-optimal behavior policy.
We demonstrate the superior performance of our algorithm over state-of-the-art approaches.
arXiv Detail & Related papers (2022-02-09T18:45:40Z) - Learning to Shift Attention for Motion Generation [55.61994201686024]
One challenge of motion generation using robot learning from demonstration techniques is that human demonstrations follow a distribution with multiple modes for one task query.
Previous approaches fail to capture all modes or tend to average modes of the demonstrations and thus generate invalid trajectories.
We propose a motion generation model with extrapolation ability to overcome this problem.
arXiv Detail & Related papers (2021-02-24T09:07:52Z) - Learning Memory-Dependent Continuous Control from Demonstrations [13.063093054280948]
This paper builds on the idea of replaying demonstrations for memory-dependent continuous control.
Experiments involving several memory-crucial continuous control tasks reveal significantly reduce interactions with the environment.
The algorithm also shows better sample efficiency and learning capabilities than a baseline reinforcement learning algorithm for memory-based control from demonstrations.
arXiv Detail & Related papers (2021-02-18T08:13:42Z) - Forgetful Experience Replay in Hierarchical Reinforcement Learning from
Demonstrations [55.41644538483948]
In this paper, we propose a combination of approaches that allow the agent to use low-quality demonstrations in complex vision-based environments.
Our proposed goal-oriented structuring of replay buffer allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations.
The solution based on our algorithm beats all the solutions for the famous MineRL competition and allows the agent to mine a diamond in the Minecraft environment.
arXiv Detail & Related papers (2020-06-17T15:38:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.