Learning Memory-Dependent Continuous Control from Demonstrations
- URL: http://arxiv.org/abs/2102.09208v1
- Date: Thu, 18 Feb 2021 08:13:42 GMT
- Title: Learning Memory-Dependent Continuous Control from Demonstrations
- Authors: Siqing Hou, Dongqi Han, Jun Tani
- Abstract summary: This paper builds on the idea of replaying demonstrations for memory-dependent continuous control.
Experiments involving several memory-crucial continuous control tasks reveal significantly reduce interactions with the environment.
The algorithm also shows better sample efficiency and learning capabilities than a baseline reinforcement learning algorithm for memory-based control from demonstrations.
- Score: 13.063093054280948
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Efficient exploration has presented a long-standing challenge in
reinforcement learning, especially when rewards are sparse. A developmental
system can overcome this difficulty by learning from both demonstrations and
self-exploration. However, existing methods are not applicable to most
real-world robotic controlling problems because they assume that environments
follow Markov decision processes (MDP); thus, they do not extend to partially
observable environments where historical observations are necessary for
decision making. This paper builds on the idea of replaying demonstrations for
memory-dependent continuous control, by proposing a novel algorithm, Recurrent
Actor-Critic with Demonstration and Experience Replay (READER). Experiments
involving several memory-crucial continuous control tasks reveal significantly
reduce interactions with the environment using our method with a reasonably
small number of demonstration samples. The algorithm also shows better sample
efficiency and learning capabilities than a baseline reinforcement learning
algorithm for memory-based control from demonstrations.
Related papers
- Bidirectional Decoding: Improving Action Chunking via Closed-Loop Resampling [51.38330727868982]
Bidirectional Decoding (BID) is a test-time inference algorithm that bridges action chunking with closed-loop operations.
We show that BID boosts the performance of two state-of-the-art generative policies across seven simulation benchmarks and two real-world tasks.
arXiv Detail & Related papers (2024-08-30T15:39:34Z) - Sequential Action-Induced Invariant Representation for Reinforcement
Learning [1.2046159151610263]
How to accurately learn task-relevant state representations from high-dimensional observations with visual distractions is a challenging problem in visual reinforcement learning.
We propose a Sequential Action-induced invariant Representation (SAR) method, in which the encoder is optimized by an auxiliary learner to only preserve the components that follow the control signals of sequential actions.
arXiv Detail & Related papers (2023-09-22T05:31:55Z) - Accelerating exploration and representation learning with offline
pre-training [52.6912479800592]
We show that exploration and representation learning can be improved by separately learning two different models from a single offline dataset.
We show that learning a state representation using noise-contrastive estimation and a model of auxiliary reward can significantly improve the sample efficiency on the challenging NetHack benchmark.
arXiv Detail & Related papers (2023-03-31T18:03:30Z) - Continuous Episodic Control [7.021281655855703]
This paper introduces Continuous Episodic Control ( CEC), a novel non-parametric episodic memory algorithm for sequential decision making in problems with a continuous action space.
Results on several sparse-reward continuous control environments show that our proposed method learns faster than state-of-the-art model-free RL and memory-augmented RL algorithms, while maintaining good long-run performance as well.
arXiv Detail & Related papers (2022-11-28T09:48:42Z) - Continuous Control with Action Quantization from Demonstrations [35.44893918778709]
In Reinforcement Learning (RL), discrete actions, as opposed to continuous actions, result in less complex exploration problems.
We propose a novel method: Action Quantization from Demonstrations (AQuaDem) to learn a discretization of continuous action spaces.
We evaluate the proposed method on three different setups: RL with demonstrations, RL with play data --demonstrations of a human playing in an environment but not solving any specific task-- and Imitation Learning.
arXiv Detail & Related papers (2021-10-19T17:59:04Z) - Visual Adversarial Imitation Learning using Variational Models [60.69745540036375]
Reward function specification remains a major impediment for learning behaviors through deep reinforcement learning.
Visual demonstrations of desired behaviors often presents an easier and more natural way to teach agents.
We develop a variational model-based adversarial imitation learning algorithm.
arXiv Detail & Related papers (2021-07-16T00:15:18Z) - On Contrastive Representations of Stochastic Processes [53.21653429290478]
Learning representations of processes is an emerging problem in machine learning.
We show that our methods are effective for learning representations of periodic functions, 3D objects and dynamical processes.
arXiv Detail & Related papers (2021-06-18T11:00:24Z) - Seeing Differently, Acting Similarly: Imitation Learning with
Heterogeneous Observations [126.78199124026398]
In many real-world imitation learning tasks, the demonstrator and the learner have to act in different but full observation spaces.
In this work, we model the above learning problem as Heterogeneous Observations Learning (HOIL)
We propose the Importance Weighting with REjection (IWRE) algorithm based on the techniques of importance-weighting, learning with rejection, and active querying to solve the key challenge of occupancy measure matching.
arXiv Detail & Related papers (2021-06-17T05:44:04Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z) - Episodic Self-Imitation Learning with Hindsight [7.743320290728377]
Episodic self-imitation learning is a novel self-imitation algorithm with a trajectory selection module and an adaptive loss function.
A selection module is introduced to filter uninformative samples from each episode of the update.
Episodic self-imitation learning has the potential to be applied to real-world problems that have continuous action spaces.
arXiv Detail & Related papers (2020-11-26T20:36:42Z) - Reinforcement Learning with Supervision from Noisy Demonstrations [38.00968774243178]
We propose a novel framework to adaptively learn the policy by jointly interacting with the environment and exploiting the expert demonstrations.
Experimental results in various environments with multiple popular reinforcement learning algorithms show that the proposed approach can learn robustly with noisy demonstrations.
arXiv Detail & Related papers (2020-06-14T06:03:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.