Reinforcement Learning with Goal-Distance Gradient
- URL: http://arxiv.org/abs/2001.00127v2
- Date: Fri, 10 Jan 2020 12:26:33 GMT
- Title: Reinforcement Learning with Goal-Distance Gradient
- Authors: Kai Jiang, XiaoLong Qin
- Abstract summary: Reinforcement learning usually uses the feedback rewards of environmental to train agents.
Most of the current methods are difficult to get good performance in sparse reward or non-reward environments.
We propose a model-free method that does not rely on environmental rewards to solve the problem of sparse rewards in the general environment.
- Score: 1.370633147306388
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning usually uses the feedback rewards of environmental to
train agents. But the rewards in the actual environment are sparse, and even
some environments will not rewards. Most of the current methods are difficult
to get good performance in sparse reward or non-reward environments. Although
using shaped rewards is effective when solving sparse reward tasks, it is
limited to specific problems and learning is also susceptible to local optima.
We propose a model-free method that does not rely on environmental rewards to
solve the problem of sparse rewards in the general environment. Our method use
the minimum number of transitions between states as the distance to replace the
rewards of environmental, and proposes a goal-distance gradient to achieve
policy improvement. We also introduce a bridge point planning method based on
the characteristics of our method to improve exploration efficiency, thereby
solving more complex tasks. Experiments show that our method performs better on
sparse reward and local optimal problems in complex environments than previous
work.
Related papers
- No Regrets: Investigating and Improving Regret Approximations for Curriculum Discovery [53.08822154199948]
Unsupervised Environment Design (UED) methods have gained recent attention as their adaptive curricula promise to enable agents to be robust to in- and out-of-distribution tasks.
This work investigates how existing UED methods select training environments, focusing on task prioritisation metrics.
We develop a method that directly trains on scenarios with high learnability.
arXiv Detail & Related papers (2024-08-27T14:31:54Z) - Automatic Reward Design via Learning Motivation-Consistent Intrinsic
Rewards [46.068337522093096]
We introduce the concept of motivation which captures the underlying goal of maximizing certain rewards.
Our method performs better than the state-of-the-art methods in handling problems of delayed reward, exploration, and credit assignment.
arXiv Detail & Related papers (2022-07-29T14:52:02Z) - Hindsight Task Relabelling: Experience Replay for Sparse Reward Meta-RL [91.26538493552817]
We present a formulation of hindsight relabeling for meta-RL, which relabels experience during meta-training to enable learning to learn entirely using sparse reward.
We demonstrate the effectiveness of our approach on a suite of challenging sparse reward goal-reaching environments.
arXiv Detail & Related papers (2021-12-02T00:51:17Z) - Unbiased Methods for Multi-Goal Reinforcement Learning [13.807859854345834]
In multi-goal reinforcement learning, the reward for each goal is sparse, and located in a small neighborhood of the goal.
We show that Hindsight Experience Replay (HER) can converge to low-return policies by overestimating chancy outcomes.
We introduce unbiased deep Q-learning and actor-critic algorithms that can handle such infinitely sparse rewards, and test them in toy environments.
arXiv Detail & Related papers (2021-06-16T15:31:51Z) - Semi-supervised reward learning for offline reinforcement learning [71.6909757718301]
Training agents usually requires reward functions, but rewards are seldom available in practice and their engineering is challenging and laborious.
We propose semi-supervised learning algorithms that learn from limited annotations and incorporate unlabelled data.
In our experiments with a simulated robotic arm, we greatly improve upon behavioural cloning and closely approach the performance achieved with ground truth rewards.
arXiv Detail & Related papers (2020-12-12T20:06:15Z) - Demonstration-efficient Inverse Reinforcement Learning in Procedurally
Generated Environments [137.86426963572214]
Inverse Reinforcement Learning can extrapolate reward functions from expert demonstrations.
We show that our approach, DE-AIRL, is demonstration-efficient and still able to extrapolate reward functions which generalize to the fully procedural domain.
arXiv Detail & Related papers (2020-12-04T11:18:02Z) - Self-Imitation Learning for Robot Tasks with Sparse and Delayed Rewards [1.2691047660244335]
We propose a practical self-imitation learning method named Self-Imitation Learning with Constant Reward (SILCR)
Our method assigns the immediate rewards at each timestep with constant values according to their final episodic rewards.
We demonstrate the effectiveness of our method in some challenging continuous robotics control tasks in MuJoCo simulation.
arXiv Detail & Related papers (2020-10-14T11:12:07Z) - Ecological Reinforcement Learning [76.9893572776141]
We study the kinds of environment properties that can make learning under such conditions easier.
understanding how properties of the environment impact the performance of reinforcement learning agents can help us to structure our tasks in ways that make learning tractable.
arXiv Detail & Related papers (2020-06-22T17:55:03Z) - RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated
Environments [15.736899098702972]
We propose a novel type of intrinsic reward which encourages the agent to take actions that lead to significant changes in its learned state representation.
We evaluate our method on multiple challenging procedurally-generated tasks in MiniGrid.
arXiv Detail & Related papers (2020-02-27T18:03:16Z) - oIRL: Robust Adversarial Inverse Reinforcement Learning with Temporally
Extended Actions [37.66289166905027]
Explicit engineering of reward functions for given environments has been a major hindrance to reinforcement learning methods.
We propose an algorithm that learns hierarchical disentangled rewards with a policy over options.
arXiv Detail & Related papers (2020-02-20T22:21:41Z) - Long-Term Visitation Value for Deep Exploration in Sparse Reward
Reinforcement Learning [34.38011902445557]
Reinforcement learning with sparse rewards is still an open challenge.
We present a novel approach that plans exploration actions far into the future by using a long-term visitation count.
Contrary to existing methods which use models of reward and dynamics, our approach is off-policy and model-free.
arXiv Detail & Related papers (2020-01-01T01:01:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.