Self-Imitation Learning for Robot Tasks with Sparse and Delayed Rewards
- URL: http://arxiv.org/abs/2010.06962v3
- Date: Tue, 25 May 2021 13:45:43 GMT
- Title: Self-Imitation Learning for Robot Tasks with Sparse and Delayed Rewards
- Authors: Zhixin Chen, Mengxiang Lin
- Abstract summary: We propose a practical self-imitation learning method named Self-Imitation Learning with Constant Reward (SILCR)
Our method assigns the immediate rewards at each timestep with constant values according to their final episodic rewards.
We demonstrate the effectiveness of our method in some challenging continuous robotics control tasks in MuJoCo simulation.
- Score: 1.2691047660244335
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The application of reinforcement learning (RL) in robotic control is still
limited in the environments with sparse and delayed rewards. In this paper, we
propose a practical self-imitation learning method named Self-Imitation
Learning with Constant Reward (SILCR). Instead of requiring hand-defined
immediate rewards from environments, our method assigns the immediate rewards
at each timestep with constant values according to their final episodic
rewards. In this way, even if the dense rewards from environments are
unavailable, every action taken by the agents would be guided properly. We
demonstrate the effectiveness of our method in some challenging continuous
robotics control tasks in MuJoCo simulation and the results show that our
method significantly outperforms the alternative methods in tasks with sparse
and delayed rewards. Even compared with alternatives with dense rewards
available, our method achieves competitive performance. The ablation
experiments also show the stability and reproducibility of our method.
Related papers
- Automatic Intrinsic Reward Shaping for Exploration in Deep Reinforcement
Learning [55.2080971216584]
We present AIRS: Automatic Intrinsic Reward Shaping that intelligently and adaptively provides high-quality intrinsic rewards to enhance exploration in reinforcement learning (RL)
We develop an intrinsic reward toolkit to provide efficient and reliable implementations of diverse intrinsic reward approaches.
arXiv Detail & Related papers (2023-01-26T01:06:46Z) - Actively Learning Costly Reward Functions for Reinforcement Learning [56.34005280792013]
We show that it is possible to train agents in complex real-world environments orders of magnitudes faster.
By enabling the application of reinforcement learning methods to new domains, we show that we can find interesting and non-trivial solutions.
arXiv Detail & Related papers (2022-11-23T19:17:20Z) - Learning Dense Reward with Temporal Variant Self-Supervision [5.131840233837565]
Complex real-world robotic applications lack explicit and informative descriptions that can directly be used as rewards.
Previous effort has shown that it is possible to algorithmically extract dense rewards directly from multimodal observations.
This paper proposes a more efficient and robust way of sampling and learning.
arXiv Detail & Related papers (2022-05-20T20:30:57Z) - Imitating, Fast and Slow: Robust learning from demonstrations via
decision-time planning [96.72185761508668]
Planning at Test-time (IMPLANT) is a new meta-algorithm for imitation learning.
We demonstrate that IMPLANT significantly outperforms benchmark imitation learning approaches on standard control environments.
arXiv Detail & Related papers (2022-04-07T17:16:52Z) - Semi-supervised reward learning for offline reinforcement learning [71.6909757718301]
Training agents usually requires reward functions, but rewards are seldom available in practice and their engineering is challenging and laborious.
We propose semi-supervised learning algorithms that learn from limited annotations and incorporate unlabelled data.
In our experiments with a simulated robotic arm, we greatly improve upon behavioural cloning and closely approach the performance achieved with ground truth rewards.
arXiv Detail & Related papers (2020-12-12T20:06:15Z) - Balance Between Efficient and Effective Learning: Dense2Sparse Reward
Shaping for Robot Manipulation with Environment Uncertainty [14.178202899299267]
We propose a simple but powerful reward shaping method, namely Dense2Sparse.
It combines the advantage of fast convergence of dense reward and the noise isolation of the sparse reward, to achieve a balance between learning efficiency and effectiveness.
The experiment results show that the Dense2Sparse method obtained higher expected reward compared with the ones using standalone dense reward or sparse reward, and it also has a superior tolerance of system uncertainty.
arXiv Detail & Related papers (2020-03-05T16:10:15Z) - RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated
Environments [15.736899098702972]
We propose a novel type of intrinsic reward which encourages the agent to take actions that lead to significant changes in its learned state representation.
We evaluate our method on multiple challenging procedurally-generated tasks in MiniGrid.
arXiv Detail & Related papers (2020-02-27T18:03:16Z) - Scalable Multi-Task Imitation Learning with Autonomous Improvement [159.9406205002599]
We build an imitation learning system that can continuously improve through autonomous data collection.
We leverage the robot's own trials as demonstrations for tasks other than the one that the robot actually attempted.
In contrast to prior imitation learning approaches, our method can autonomously collect data with sparse supervision for continuous improvement.
arXiv Detail & Related papers (2020-02-25T18:56:42Z) - oIRL: Robust Adversarial Inverse Reinforcement Learning with Temporally
Extended Actions [37.66289166905027]
Explicit engineering of reward functions for given environments has been a major hindrance to reinforcement learning methods.
We propose an algorithm that learns hierarchical disentangled rewards with a policy over options.
arXiv Detail & Related papers (2020-02-20T22:21:41Z) - Soft Hindsight Experience Replay [77.99182201815763]
Soft Hindsight Experience Replay (SHER) is a novel approach based on HER and Maximum Entropy Reinforcement Learning (MERL)
We evaluate SHER on Open AI Robotic manipulation tasks with sparse rewards.
arXiv Detail & Related papers (2020-02-06T03:57:04Z) - Reinforcement Learning with Goal-Distance Gradient [1.370633147306388]
Reinforcement learning usually uses the feedback rewards of environmental to train agents.
Most of the current methods are difficult to get good performance in sparse reward or non-reward environments.
We propose a model-free method that does not rely on environmental rewards to solve the problem of sparse rewards in the general environment.
arXiv Detail & Related papers (2020-01-01T02:37:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.