Related papers: Self-Imitation Learning for Robot Tasks with Sparse and Delayed Rewards

Self-Imitation Learning for Robot Tasks with Sparse and Delayed Rewards

URL: http://arxiv.org/abs/2010.06962v3
Date: Tue, 25 May 2021 13:45:43 GMT
Title: Self-Imitation Learning for Robot Tasks with Sparse and Delayed Rewards
Authors: Zhixin Chen, Mengxiang Lin
Abstract summary: We propose a practical self-imitation learning method named Self-Imitation Learning with Constant Reward (SILCR) Our method assigns the immediate rewards at each timestep with constant values according to their final episodic rewards. We demonstrate the effectiveness of our method in some challenging continuous robotics control tasks in MuJoCo simulation.
Score: 1.2691047660244335
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The application of reinforcement learning (RL) in robotic control is still limited in the environments with sparse and delayed rewards. In this paper, we propose a practical self-imitation learning method named Self-Imitation Learning with Constant Reward (SILCR). Instead of requiring hand-defined immediate rewards from environments, our method assigns the immediate rewards at each timestep with constant values according to their final episodic rewards. In this way, even if the dense rewards from environments are unavailable, every action taken by the agents would be guided properly. We demonstrate the effectiveness of our method in some challenging continuous robotics control tasks in MuJoCo simulation and the results show that our method significantly outperforms the alternative methods in tasks with sparse and delayed rewards. Even compared with alternatives with dense rewards available, our method achieves competitive performance. The ablation experiments also show the stability and reproducibility of our method.

Related papers

Curriculum Reinforcement Learning for Complex Reward Functions [5.78463306498655]
We propose a two-stage reward curriculum that first maximizes a simple reward function and then transitions to the full, complex reward. We evaluate our method on the DeepMind control suite, modified to include an additional constraint term in the reward definitions. Our results demonstrate the potential of two-stage reward curricula for efficient and stable RL in environments with complex rewards.
arXiv Detail & Related papers (2024-10-22T08:07:44Z)
RILe: Reinforced Imitation Learning [60.63173816209543]
RILe (Reinforced Learning) is a framework that combines the strengths of imitation learning and inverse reinforcement learning to learn a dense reward function efficiently. Our framework produces high-performing policies in high-dimensional tasks where direct imitation fails to replicate complex behaviors.
arXiv Detail & Related papers (2024-06-12T17:56:31Z)
Automatic Intrinsic Reward Shaping for Exploration in Deep Reinforcement Learning [55.2080971216584]
We present AIRS: Automatic Intrinsic Reward Shaping that intelligently and adaptively provides high-quality intrinsic rewards to enhance exploration in reinforcement learning (RL) We develop an intrinsic reward toolkit to provide efficient and reliable implementations of diverse intrinsic reward approaches.
arXiv Detail & Related papers (2023-01-26T01:06:46Z)
Actively Learning Costly Reward Functions for Reinforcement Learning [56.34005280792013]
We show that it is possible to train agents in complex real-world environments orders of magnitudes faster. By enabling the application of reinforcement learning methods to new domains, we show that we can find interesting and non-trivial solutions.
arXiv Detail & Related papers (2022-11-23T19:17:20Z)
Learning Dense Reward with Temporal Variant Self-Supervision [5.131840233837565]
Complex real-world robotic applications lack explicit and informative descriptions that can directly be used as rewards. Previous effort has shown that it is possible to algorithmically extract dense rewards directly from multimodal observations. This paper proposes a more efficient and robust way of sampling and learning.
arXiv Detail & Related papers (2022-05-20T20:30:57Z)
Imitating, Fast and Slow: Robust learning from demonstrations via decision-time planning [96.72185761508668]
Planning at Test-time (IMPLANT) is a new meta-algorithm for imitation learning. We demonstrate that IMPLANT significantly outperforms benchmark imitation learning approaches on standard control environments.
arXiv Detail & Related papers (2022-04-07T17:16:52Z)
Semi-supervised reward learning for offline reinforcement learning [71.6909757718301]
Training agents usually requires reward functions, but rewards are seldom available in practice and their engineering is challenging and laborious. We propose semi-supervised learning algorithms that learn from limited annotations and incorporate unlabelled data. In our experiments with a simulated robotic arm, we greatly improve upon behavioural cloning and closely approach the performance achieved with ground truth rewards.
arXiv Detail & Related papers (2020-12-12T20:06:15Z)
Balance Between Efficient and Effective Learning: Dense2Sparse Reward Shaping for Robot Manipulation with Environment Uncertainty [14.178202899299267]
We propose a simple but powerful reward shaping method, namely Dense2Sparse. It combines the advantage of fast convergence of dense reward and the noise isolation of the sparse reward, to achieve a balance between learning efficiency and effectiveness. The experiment results show that the Dense2Sparse method obtained higher expected reward compared with the ones using standalone dense reward or sparse reward, and it also has a superior tolerance of system uncertainty.
arXiv Detail & Related papers (2020-03-05T16:10:15Z)
RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments [15.736899098702972]
We propose a novel type of intrinsic reward which encourages the agent to take actions that lead to significant changes in its learned state representation. We evaluate our method on multiple challenging procedurally-generated tasks in MiniGrid.
arXiv Detail & Related papers (2020-02-27T18:03:16Z)
Scalable Multi-Task Imitation Learning with Autonomous Improvement [159.9406205002599]
We build an imitation learning system that can continuously improve through autonomous data collection. We leverage the robot's own trials as demonstrations for tasks other than the one that the robot actually attempted. In contrast to prior imitation learning approaches, our method can autonomously collect data with sparse supervision for continuous improvement.
arXiv Detail & Related papers (2020-02-25T18:56:42Z)
oIRL: Robust Adversarial Inverse Reinforcement Learning with Temporally Extended Actions [37.66289166905027]
Explicit engineering of reward functions for given environments has been a major hindrance to reinforcement learning methods. We propose an algorithm that learns hierarchical disentangled rewards with a policy over options.
arXiv Detail & Related papers (2020-02-20T22:21:41Z)
Soft Hindsight Experience Replay [77.99182201815763]
Soft Hindsight Experience Replay (SHER) is a novel approach based on HER and Maximum Entropy Reinforcement Learning (MERL) We evaluate SHER on Open AI Robotic manipulation tasks with sparse rewards.
arXiv Detail & Related papers (2020-02-06T03:57:04Z)
Reinforcement Learning with Goal-Distance Gradient [1.370633147306388]
Reinforcement learning usually uses the feedback rewards of environmental to train agents. Most of the current methods are difficult to get good performance in sparse reward or non-reward environments. We propose a model-free method that does not rely on environmental rewards to solve the problem of sparse rewards in the general environment.
arXiv Detail & Related papers (2020-01-01T02:37:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.