Perception-Prediction-Reaction Agents for Deep Reinforcement Learning
- URL: http://arxiv.org/abs/2006.15223v1
- Date: Fri, 26 Jun 2020 21:53:47 GMT
- Title: Perception-Prediction-Reaction Agents for Deep Reinforcement Learning
- Authors: Adam Stooke, Valentin Dalibard, Siddhant M. Jayakumar, Wojciech M.
Czarnecki, and Max Jaderberg
- Abstract summary: We introduce a new recurrent agent architecture which improves reinforcement learning in tasks requiring long-term memory.
A new auxiliary loss regularizes policies drawn from all three cores against each other, enacting the prior that the policy should be expressible from either recent or long-term memory.
- Score: 12.566380944901816
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a new recurrent agent architecture and associated auxiliary
losses which improve reinforcement learning in partially observable tasks
requiring long-term memory. We employ a temporal hierarchy, using a
slow-ticking recurrent core to allow information to flow more easily over long
time spans, and three fast-ticking recurrent cores with connections designed to
create an information asymmetry. The \emph{reaction} core incorporates new
observations with input from the slow core to produce the agent's policy; the
\emph{perception} core accesses only short-term observations and informs the
slow core; lastly, the \emph{prediction} core accesses only long-term memory.
An auxiliary loss regularizes policies drawn from all three cores against each
other, enacting the prior that the policy should be expressible from either
recent or long-term memory. We present the resulting
\emph{Perception-Prediction-Reaction} (PPR) agent and demonstrate its improved
performance over a strong LSTM-agent baseline in DMLab-30, particularly in
tasks requiring long-term memory. We further show significant improvements in
Capture the Flag, an environment requiring agents to acquire a complicated
mixture of skills over long time scales. In a series of ablation experiments,
we probe the importance of each component of the PPR agent, establishing that
the entire, novel combination is necessary for this intriguing result.
Related papers
- SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios.
In the early route, intermediate outputs are consolidated via an anti-redundancy operation.
In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z) - Improving Out-of-Distribution Generalization of Neural Rerankers with
Contextualized Late Interaction [52.63663547523033]
Late interaction, the simplest form of multi-vector, is also helpful to neural rerankers that only use the [] vector to compute the similarity score.
We show that the finding is consistent across different model sizes and first-stage retrievers of diverse natures.
arXiv Detail & Related papers (2023-02-13T18:42:17Z) - Generalization, Mayhems and Limits in Recurrent Proximal Policy
Optimization [1.8570591025615453]
We highlight vital details that one must get right when adding recurrence to achieve a correct and efficient implementation.
We explore the limitations of recurrent PPO by the benchmarking contributed novel environments Mortar Mayhem and Searing Spotlights.
Remarkably, we can demonstrate a transition to strong generalization in Mortar Mayhem when scaling the number of training seeds.
arXiv Detail & Related papers (2022-05-23T07:54:15Z) - Recurrence-in-Recurrence Networks for Video Deblurring [58.49075799159015]
State-of-the-art video deblurring methods often adopt recurrent neural networks to model the temporal dependency between the frames.
In this paper, we propose recurrence-in-recurrence network architecture to cope with the limitations of short-ranged memory.
arXiv Detail & Related papers (2022-03-12T11:58:13Z) - Towards mental time travel: a hierarchical memory for reinforcement
learning agents [9.808027857786781]
Reinforcement learning agents often forget details of the past, especially after delays or distractor tasks.
We propose a Hierarchical Transformer Memory (HTM) which helps agents to remember the past in detail.
Agents with HTM can extrapolate to task sequences an order of magnitude longer than they were trained on, and can even generalize zero-shot from a meta-learning setting to maintaining knowledge across episodes.
arXiv Detail & Related papers (2021-05-28T18:12:28Z) - Temporal Memory Relation Network for Workflow Recognition from Surgical
Video [53.20825496640025]
We propose a novel end-to-end temporal memory relation network (TMNet) for relating long-range and multi-scale temporal patterns.
We have extensively validated our approach on two benchmark surgical video datasets.
arXiv Detail & Related papers (2021-03-30T13:20:26Z) - Dynamic Embeddings for Interaction Prediction [2.5758502140236024]
In recommender systems (RSs), predicting the next item that a user interacts with is critical for user retention.
Recent studies have shown the effectiveness of modeling the mutual interactions between users and items using separate user and item embeddings.
We propose a novel method called DeePRed that addresses some of their limitations.
arXiv Detail & Related papers (2020-11-10T16:04:46Z) - Untangling tradeoffs between recurrence and self-attention in neural
networks [81.30894993852813]
We present a formal analysis of how self-attention affects gradient propagation in recurrent networks.
We prove that it mitigates the problem of vanishing gradients when trying to capture long-term dependencies.
We propose a relevancy screening mechanism that allows for a scalable use of sparse self-attention with recurrence.
arXiv Detail & Related papers (2020-06-16T19:24:25Z) - Sequential Recommender via Time-aware Attentive Memory Network [67.26862011527986]
We propose a temporal gating methodology to improve attention mechanism and recurrent units.
We also propose a Multi-hop Time-aware Attentive Memory network to integrate long-term and short-term preferences.
Our approach is scalable for candidate retrieval tasks and can be viewed as a non-linear generalization of latent factorization for dot-product based Top-K recommendation.
arXiv Detail & Related papers (2020-05-18T11:29:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.