A Self-Supervised Auxiliary Loss for Deep RL in Partially Observable
Settings
- URL: http://arxiv.org/abs/2104.08492v1
- Date: Sat, 17 Apr 2021 09:28:17 GMT
- Title: A Self-Supervised Auxiliary Loss for Deep RL in Partially Observable
Settings
- Authors: Eltayeb Ahmed, Luisa Zintgraf, Christian A. Schroeder de Witt and
Nicolas Usunier
- Abstract summary: auxiliary loss useful for reinforcement learning in environments where strong performing agents are required to be able to navigate a spatial environment.
We tested this auxiliary loss on a navigation task in a gridworld and achieved 9.6% increase in accumulative episode reward compared to a strong baseline approach.
- Score: 15.99292016541287
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work we explore an auxiliary loss useful for reinforcement learning
in environments where strong performing agents are required to be able to
navigate a spatial environment. The auxiliary loss proposed is to minimize the
classification error of a neural network classifier that predicts whether or
not a pair of states sampled from the agents current episode trajectory are in
order. The classifier takes as input a pair of states as well as the agent's
memory. The motivation for this auxiliary loss is that there is a strong
correlation with which of a pair of states is more recent in the agents episode
trajectory and which of the two states is spatially closer to the agent. Our
hypothesis is that learning features to answer this question encourages the
agent to learn and internalize in memory representations of states that
facilitate spatial reasoning. We tested this auxiliary loss on a navigation
task in a gridworld and achieved 9.6% increase in accumulative episode reward
compared to a strong baseline approach.
Related papers
- Interpretable Brain-Inspired Representations Improve RL Performance on
Visual Navigation Tasks [0.0]
We show how the method of slow feature analysis (SFA) overcomes both limitations by generating interpretable representations of visual data.
We employ SFA in a modern reinforcement learning context, analyse and compare representations and illustrate where hierarchical SFA can outperform other feature extractors on navigation tasks.
arXiv Detail & Related papers (2024-02-19T11:35:01Z) - Can Active Sampling Reduce Causal Confusion in Offline Reinforcement
Learning? [58.942118128503104]
Causal confusion is a phenomenon where an agent learns a policy that reflects imperfect spurious correlations in the data.
This phenomenon is particularly pronounced in domains such as robotics.
In this paper, we study causal confusion in offline reinforcement learning.
arXiv Detail & Related papers (2023-12-28T17:54:56Z) - Scalable Multi-agent Covering Option Discovery based on Kronecker Graphs [49.71319907864573]
In this paper, we propose multi-agent skill discovery which enables the ease of decomposition.
Our key idea is to approximate the joint state space as a Kronecker graph, based on which we can directly estimate its Fiedler vector.
Considering that directly computing the Laplacian spectrum is intractable for tasks with infinite-scale state spaces, we further propose a deep learning extension of our method.
arXiv Detail & Related papers (2023-07-21T14:53:12Z) - Agent-State Construction with Auxiliary Inputs [16.79847469127811]
We present a series of examples illustrating the different ways of using auxiliary inputs for reinforcement learning.
We show that these auxiliary inputs can be used to discriminate between observations that would otherwise be aliased.
This approach is complementary to state-of-the-art methods such as recurrent neural networks and truncated back-propagation.
arXiv Detail & Related papers (2022-11-15T00:18:14Z) - Reinforcement Learning with Automated Auxiliary Loss Search [34.83123677004838]
We propose a principled and universal method for learning better representations with auxiliary loss functions.
Specifically, we define a general auxiliary loss space of size $7.5 times 1020$ and explore the space with an efficient evolutionary search strategy.
Empirical results show that the discovered auxiliary loss significantly improves the performance on both high-dimensional (image) and low-dimensional (vector) unseen tasks.
arXiv Detail & Related papers (2022-10-12T09:24:53Z) - ReAct: Temporal Action Detection with Relational Queries [84.76646044604055]
This work aims at advancing temporal action detection (TAD) using an encoder-decoder framework with action queries.
We first propose a relational attention mechanism in the decoder, which guides the attention among queries based on their relations.
Lastly, we propose to predict the localization quality of each action query at inference in order to distinguish high-quality queries.
arXiv Detail & Related papers (2022-07-14T17:46:37Z) - ReCCoVER: Detecting Causal Confusion for Explainable Reinforcement
Learning [2.984934409689467]
Causal confusion refers to a phenomenon where an agent learns spurious correlations between features which might not hold across the entire state space.
We propose ReCCoVER, an algorithm which detects causal confusion in agent's reasoning before deployment.
arXiv Detail & Related papers (2022-03-21T13:17:30Z) - MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven
Reinforcement Learning [65.52675802289775]
We show that an uncertainty aware classifier can solve challenging reinforcement learning problems.
We propose a novel method for computing the normalized maximum likelihood (NML) distribution.
We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions.
arXiv Detail & Related papers (2021-07-15T08:19:57Z) - Curious Exploration and Return-based Memory Restoration for Deep
Reinforcement Learning [2.3226893628361682]
In this paper, we focus on training a single agent to score goals with binary success/failure reward function.
The proposed method can be utilized to train agents in environments with fairly complex state and action spaces.
arXiv Detail & Related papers (2021-05-02T16:01:34Z) - Regressive Domain Adaptation for Unsupervised Keypoint Detection [67.2950306888855]
Domain adaptation (DA) aims at transferring knowledge from a labeled source domain to an unlabeled target domain.
We present a method of regressive domain adaptation (RegDA) for unsupervised keypoint detection.
Our method brings large improvement by 8% to 11% in terms of PCK on different datasets.
arXiv Detail & Related papers (2021-03-10T16:45:22Z) - Maximizing Information Gain in Partially Observable Environments via
Prediction Reward [64.24528565312463]
This paper tackles the challenge of using belief-based rewards for a deep RL agent.
We derive the exact error between negative entropy and the expected prediction reward.
This insight provides theoretical motivation for several fields using prediction rewards.
arXiv Detail & Related papers (2020-05-11T08:13:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.