Learning Intrinsic Symbolic Rewards in Reinforcement Learning
- URL: http://arxiv.org/abs/2010.03694v2
- Date: Fri, 9 Oct 2020 06:42:03 GMT
- Title: Learning Intrinsic Symbolic Rewards in Reinforcement Learning
- Authors: Hassam Sheikh, Shauharda Khadka, Santiago Miret, Somdeb Majumdar
- Abstract summary: We present a method that discovers dense rewards in the form of low-dimensional symbolic trees.
We show that the discovered dense rewards are an effective signal for an RL policy to solve the benchmark tasks.
- Score: 7.101885582663675
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning effective policies for sparse objectives is a key challenge in Deep
Reinforcement Learning (RL). A common approach is to design task-related dense
rewards to improve task learnability. While such rewards are easily
interpreted, they rely on heuristics and domain expertise. Alternate approaches
that train neural networks to discover dense surrogate rewards avoid
heuristics, but are high-dimensional, black-box solutions offering little
interpretability. In this paper, we present a method that discovers dense
rewards in the form of low-dimensional symbolic trees - thus making them more
tractable for analysis. The trees use simple functional operators to map an
agent's observations to a scalar reward, which then supervises the policy
gradient learning of a neural network policy. We test our method on continuous
action spaces in Mujoco and discrete action spaces in Atari and Pygame
environments. We show that the discovered dense rewards are an effective signal
for an RL policy to solve the benchmark tasks. Notably, we significantly
outperform a widely used, contemporary neural-network based reward-discovery
algorithm in all environments considered.
Related papers
- Black box meta-learning intrinsic rewards for sparse-reward environments [0.0]
This work investigates how meta-learning can improve the training signal received by RL agents.
We analyze and compare this approach to the use of extrinsic rewards and a meta-learned advantage function.
The developed algorithms are evaluated on distributions of continuous control tasks with both parametric and non-parametric variations.
arXiv Detail & Related papers (2024-07-31T12:09:33Z) - Random Latent Exploration for Deep Reinforcement Learning [71.88709402926415]
This paper introduces a new exploration technique called Random Latent Exploration (RLE)
RLE combines the strengths of bonus-based and noise-based (two popular approaches for effective exploration in deep RL) exploration strategies.
We evaluate it on the challenging Atari and IsaacGym benchmarks and show that RLE exhibits higher overall scores across all the tasks than other approaches.
arXiv Detail & Related papers (2024-07-18T17:55:22Z) - Accelerating Exploration with Unlabeled Prior Data [66.43995032226466]
We study how prior data without reward labels may be used to guide and accelerate exploration for an agent solving a new sparse reward task.
We propose a simple approach that learns a reward model from online experience, labels the unlabeled prior data with optimistic rewards, and then uses it concurrently alongside the online data for downstream policy and critic optimization.
arXiv Detail & Related papers (2023-11-09T00:05:17Z) - Reward Learning with Trees: Methods and Evaluation [10.473362152378979]
We propose a method for learning reward trees from preference labels.
We show it to be broadly competitive with neural networks on challenging high-dimensional tasks.
Having found that reward tree learning can be done effectively in complex settings, we then consider why it should be used.
arXiv Detail & Related papers (2022-10-03T15:17:25Z) - Dealing with Sparse Rewards Using Graph Neural Networks [0.15540058359482856]
We propose two modifications of one of the recent reward shaping methods based on graph convolutional networks.
We empirically validate the effectiveness of our solutions for the task of navigation in a 3D environment with sparse rewards.
For the solution featuring attention mechanism, we are also able to show that the learned attention is concentrated on edges corresponding to important transitions in 3D environment.
arXiv Detail & Related papers (2022-03-25T02:42:07Z) - Hindsight Task Relabelling: Experience Replay for Sparse Reward Meta-RL [91.26538493552817]
We present a formulation of hindsight relabeling for meta-RL, which relabels experience during meta-training to enable learning to learn entirely using sparse reward.
We demonstrate the effectiveness of our approach on a suite of challenging sparse reward goal-reaching environments.
arXiv Detail & Related papers (2021-12-02T00:51:17Z) - Anti-Concentrated Confidence Bonuses for Scalable Exploration [57.91943847134011]
Intrinsic rewards play a central role in handling the exploration-exploitation trade-off.
We introduce emphanti-concentrated confidence bounds for efficiently approximating the elliptical bonus.
We develop a practical variant for deep reinforcement learning that is competitive with contemporary intrinsic rewards on Atari benchmarks.
arXiv Detail & Related papers (2021-10-21T15:25:15Z) - MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven
Reinforcement Learning [65.52675802289775]
We show that an uncertainty aware classifier can solve challenging reinforcement learning problems.
We propose a novel method for computing the normalized maximum likelihood (NML) distribution.
We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions.
arXiv Detail & Related papers (2021-07-15T08:19:57Z) - Disturbing Reinforcement Learning Agents with Corrupted Rewards [62.997667081978825]
We analyze the effects of different attack strategies based on reward perturbations on reinforcement learning algorithms.
We show that smoothly crafting adversarial rewards are able to mislead the learner, and that using low exploration probability values, the policy learned is more robust to corrupt rewards.
arXiv Detail & Related papers (2021-02-12T15:53:48Z) - Learning Guidance Rewards with Trajectory-space Smoothing [22.456737935789103]
Long-term temporal credit assignment is an important challenge in deep reinforcement learning.
Existing policy-gradient and Q-learning algorithms rely on dense environmental rewards that provide rich short-term supervision.
Recent works have proposed algorithms to learn dense "guidance" rewards that could be used in place of the sparse or delayed environmental rewards.
arXiv Detail & Related papers (2020-10-23T23:55:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.