Related papers: Learning Intrinsic Symbolic Rewards in Reinforcement Learning

Learning Intrinsic Symbolic Rewards in Reinforcement Learning

URL: http://arxiv.org/abs/2010.03694v2
Date: Fri, 9 Oct 2020 06:42:03 GMT
Title: Learning Intrinsic Symbolic Rewards in Reinforcement Learning
Authors: Hassam Sheikh, Shauharda Khadka, Santiago Miret, Somdeb Majumdar
Abstract summary: We present a method that discovers dense rewards in the form of low-dimensional symbolic trees. We show that the discovered dense rewards are an effective signal for an RL policy to solve the benchmark tasks.
Score: 7.101885582663675
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Learning effective policies for sparse objectives is a key challenge in Deep Reinforcement Learning (RL). A common approach is to design task-related dense rewards to improve task learnability. While such rewards are easily interpreted, they rely on heuristics and domain expertise. Alternate approaches that train neural networks to discover dense surrogate rewards avoid heuristics, but are high-dimensional, black-box solutions offering little interpretability. In this paper, we present a method that discovers dense rewards in the form of low-dimensional symbolic trees - thus making them more tractable for analysis. The trees use simple functional operators to map an agent's observations to a scalar reward, which then supervises the policy gradient learning of a neural network policy. We test our method on continuous action spaces in Mujoco and discrete action spaces in Atari and Pygame environments. We show that the discovered dense rewards are an effective signal for an RL policy to solve the benchmark tasks. Notably, we significantly outperform a widely used, contemporary neural-network based reward-discovery algorithm in all environments considered.

Related papers

Autonomous state-space segmentation for Deep-RL sparse reward scenarios [0.30693357740321775]
Intrinsic Motivations could be an effective way to help Deep Reinforcement Learning algorithms learn. We propose a two-level architecture that alternates an ''intrinsically driven'' phase of exploration and autonomous sub-goal generation.
arXiv Detail & Related papers (2025-04-04T13:06:23Z)
Black box meta-learning intrinsic rewards for sparse-reward environments [0.0]
This work investigates how meta-learning can improve the training signal received by RL agents. We analyze and compare this approach to the use of extrinsic rewards and a meta-learned advantage function. The developed algorithms are evaluated on distributions of continuous control tasks with both parametric and non-parametric variations.
arXiv Detail & Related papers (2024-07-31T12:09:33Z)
Random Latent Exploration for Deep Reinforcement Learning [71.88709402926415]
This paper introduces a new exploration technique called Random Latent Exploration (RLE) RLE combines the strengths of bonus-based and noise-based (two popular approaches for effective exploration in deep RL) exploration strategies. We evaluate it on the challenging Atari and IsaacGym benchmarks and show that RLE exhibits higher overall scores across all the tasks than other approaches.
arXiv Detail & Related papers (2024-07-18T17:55:22Z)
Accelerating Exploration with Unlabeled Prior Data [66.43995032226466]
We study how prior data without reward labels may be used to guide and accelerate exploration for an agent solving a new sparse reward task. We propose a simple approach that learns a reward model from online experience, labels the unlabeled prior data with optimistic rewards, and then uses it concurrently alongside the online data for downstream policy and critic optimization.
arXiv Detail & Related papers (2023-11-09T00:05:17Z)
Reward Learning with Trees: Methods and Evaluation [10.473362152378979]
We propose a method for learning reward trees from preference labels. We show it to be broadly competitive with neural networks on challenging high-dimensional tasks. Having found that reward tree learning can be done effectively in complex settings, we then consider why it should be used.
arXiv Detail & Related papers (2022-10-03T15:17:25Z)
Dealing with Sparse Rewards Using Graph Neural Networks [0.15540058359482856]
We propose two modifications of one of the recent reward shaping methods based on graph convolutional networks. We empirically validate the effectiveness of our solutions for the task of navigation in a 3D environment with sparse rewards. For the solution featuring attention mechanism, we are also able to show that the learned attention is concentrated on edges corresponding to important transitions in 3D environment.
arXiv Detail & Related papers (2022-03-25T02:42:07Z)
Hindsight Task Relabelling: Experience Replay for Sparse Reward Meta-RL [91.26538493552817]
We present a formulation of hindsight relabeling for meta-RL, which relabels experience during meta-training to enable learning to learn entirely using sparse reward. We demonstrate the effectiveness of our approach on a suite of challenging sparse reward goal-reaching environments.
arXiv Detail & Related papers (2021-12-02T00:51:17Z)
Anti-Concentrated Confidence Bonuses for Scalable Exploration [57.91943847134011]
Intrinsic rewards play a central role in handling the exploration-exploitation trade-off. We introduce emphanti-concentrated confidence bounds for efficiently approximating the elliptical bonus. We develop a practical variant for deep reinforcement learning that is competitive with contemporary intrinsic rewards on Atari benchmarks.
arXiv Detail & Related papers (2021-10-21T15:25:15Z)
MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven Reinforcement Learning [65.52675802289775]
We show that an uncertainty aware classifier can solve challenging reinforcement learning problems. We propose a novel method for computing the normalized maximum likelihood (NML) distribution. We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions.
arXiv Detail & Related papers (2021-07-15T08:19:57Z)
Disturbing Reinforcement Learning Agents with Corrupted Rewards [62.997667081978825]
We analyze the effects of different attack strategies based on reward perturbations on reinforcement learning algorithms. We show that smoothly crafting adversarial rewards are able to mislead the learner, and that using low exploration probability values, the policy learned is more robust to corrupt rewards.
arXiv Detail & Related papers (2021-02-12T15:53:48Z)
Semi-supervised reward learning for offline reinforcement learning [71.6909757718301]
Training agents usually requires reward functions, but rewards are seldom available in practice and their engineering is challenging and laborious. We propose semi-supervised learning algorithms that learn from limited annotations and incorporate unlabelled data. In our experiments with a simulated robotic arm, we greatly improve upon behavioural cloning and closely approach the performance achieved with ground truth rewards.
arXiv Detail & Related papers (2020-12-12T20:06:15Z)
Learning Guidance Rewards with Trajectory-space Smoothing [22.456737935789103]
Long-term temporal credit assignment is an important challenge in deep reinforcement learning. Existing policy-gradient and Q-learning algorithms rely on dense environmental rewards that provide rich short-term supervision. Recent works have proposed algorithms to learn dense "guidance" rewards that could be used in place of the sparse or delayed environmental rewards.
arXiv Detail & Related papers (2020-10-23T23:55:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.