Related papers: Continuous Control with Action Quantization from Demonstrations

Continuous Control with Action Quantization from Demonstrations

URL: http://arxiv.org/abs/2110.10149v1
Date: Tue, 19 Oct 2021 17:59:04 GMT
Title: Continuous Control with Action Quantization from Demonstrations
Authors: Robert Dadashi, L\'eonard Hussenot, Damien Vincent, Sertan Girgin, Anton Raichuk, Matthieu Geist, Olivier Pietquin
Abstract summary: In Reinforcement Learning (RL), discrete actions, as opposed to continuous actions, result in less complex exploration problems. We propose a novel method: Action Quantization from Demonstrations (AQuaDem) to learn a discretization of continuous action spaces. We evaluate the proposed method on three different setups: RL with demonstrations, RL with play data --demonstrations of a human playing in an environment but not solving any specific task-- and Imitation Learning.
Score: 35.44893918778709
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In Reinforcement Learning (RL), discrete actions, as opposed to continuous actions, result in less complex exploration problems and the immediate computation of the maximum of the action-value function which is central to dynamic programming-based methods. In this paper, we propose a novel method: Action Quantization from Demonstrations (AQuaDem) to learn a discretization of continuous action spaces by leveraging the priors of demonstrations. This dramatically reduces the exploration problem, since the actions faced by the agent not only are in a finite number but also are plausible in light of the demonstrator's behavior. By discretizing the action space we can apply any discrete action deep RL algorithm to the continuous control problem. We evaluate the proposed method on three different setups: RL with demonstrations, RL with play data --demonstrations of a human playing in an environment but not solving any specific task-- and Imitation Learning. For all three setups, we only consider human data, which is more challenging than synthetic data. We found that AQuaDem consistently outperforms state-of-the-art continuous control methods, both in terms of performance and sample efficiency. We provide visualizations and videos in the paper's website: https://google-research.github.io/aquadem.

Related papers

Offline Action-Free Learning of Ex-BMDPs by Comparing Diverse Datasets [87.62730694973696]
This paper introduces CRAFT, a sample-efficient algorithm leveraging differences in controllable feature dynamics across agents to learn representations. We provide theoretical guarantees for CRAFT's performance and demonstrate its feasibility on a toy example.
arXiv Detail & Related papers (2025-03-26T22:05:57Z)
Provably Efficient Action-Manipulation Attack Against Continuous Reinforcement Learning [49.48615590763914]
We propose a black-box attack algorithm named LCBT, which uses the Monte Carlo tree search method for efficient action searching and manipulation. We conduct our proposed attack methods on three aggressive algorithms: DDPG, PPO, and TD3 in continuous settings, which show a promising attack performance.
arXiv Detail & Related papers (2024-11-20T08:20:29Z)
Reinforcement Learning with Action Sequence for Data-Efficient Robot Learning [62.3886343725955]
We introduce a novel RL algorithm that learns a critic network that outputs Q-values over a sequence of actions. By explicitly training the value functions to learn the consequence of executing a series of current and future actions, our algorithm allows for learning useful value functions from noisy trajectories.
arXiv Detail & Related papers (2024-11-19T01:23:52Z)
Bidirectional Decoding: Improving Action Chunking via Closed-Loop Resampling [51.38330727868982]
Bidirectional Decoding (BID) is a test-time inference algorithm that bridges action chunking with closed-loop operations. We show that BID boosts the performance of two state-of-the-art generative policies across seven simulation benchmarks and two real-world tasks.
arXiv Detail & Related papers (2024-08-30T15:39:34Z)
A Dual Approach to Imitation Learning from Observations with Offline Datasets [19.856363985916644]
Demonstrations are an effective alternative to task specification for learning agents in settings where designing a reward function is difficult. We derive DILO, an algorithm that can leverage arbitrary suboptimal data to learn imitating policies without requiring expert actions.
arXiv Detail & Related papers (2024-06-13T04:39:42Z)
Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data. In this paper, we propose an adaptive scheme for action quantization. We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z)
Value function estimation using conditional diffusion models for control [62.27184818047923]
We propose a simple algorithm called Diffused Value Function (DVF) It learns a joint multi-step model of the environment-robot interaction dynamics using a diffusion model. We show how DVF can be used to efficiently capture the state visitation measure for multiple controllers.
arXiv Detail & Related papers (2023-06-09T18:40:55Z)
Latent Exploration for Reinforcement Learning [87.42776741119653]
In Reinforcement Learning, agents learn policies by exploring and interacting with the environment. We propose LATent TIme-Correlated Exploration (Lattice), a method to inject temporally-correlated noise into the latent state of the policy network.
arXiv Detail & Related papers (2023-05-31T17:40:43Z)
MoDem: Accelerating Visual Model-Based Reinforcement Learning with Demonstrations [36.44386146801296]
Poor sample efficiency continues to be the primary challenge for deployment of deep Reinforcement Learning (RL) algorithms for real-world applications. We find that leveraging just a handful of demonstrations can dramatically improve the sample-efficiency of model-based RL. We empirically study three complex visuo-motor control domains and find that our method is 150%-250% more successful in completing sparse reward tasks.
arXiv Detail & Related papers (2022-12-12T04:28:50Z)
A Memory-Related Multi-Task Method Based on Task-Agnostic Exploration [26.17597857264231]
In contrast to imitation learning, there is no expert data, only the data collected through environmental exploration. Since the action sequence to solve the new task may be the combination of trajectory segments of multiple training tasks, the test task and the solving strategy do not exist directly in the training data. We propose a Memory-related Multi-task Method (M3) to address this problem.
arXiv Detail & Related papers (2022-09-09T03:02:49Z)
Reinforcement Learning with Sparse Rewards using Guidance from Offline Demonstration [9.017416068706579]
A major challenge in real-world reinforcement learning (RL) is the sparsity of reward feedback. We develop an algorithm that exploits the offline demonstration data generated by a sub-optimal behavior policy. We demonstrate the superior performance of our algorithm over state-of-the-art approaches.
arXiv Detail & Related papers (2022-02-09T18:45:40Z)
Learning Memory-Dependent Continuous Control from Demonstrations [13.063093054280948]
This paper builds on the idea of replaying demonstrations for memory-dependent continuous control. Experiments involving several memory-crucial continuous control tasks reveal significantly reduce interactions with the environment. The algorithm also shows better sample efficiency and learning capabilities than a baseline reinforcement learning algorithm for memory-based control from demonstrations.
arXiv Detail & Related papers (2021-02-18T08:13:42Z)
Forgetful Experience Replay in Hierarchical Reinforcement Learning from Demonstrations [55.41644538483948]
In this paper, we propose a combination of approaches that allow the agent to use low-quality demonstrations in complex vision-based environments. Our proposed goal-oriented structuring of replay buffer allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations. The solution based on our algorithm beats all the solutions for the famous MineRL competition and allows the agent to mine a diamond in the Minecraft environment.
arXiv Detail & Related papers (2020-06-17T15:38:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.