Related papers: Perception-Prediction-Reaction Agents for Deep Reinforcement Learning

Perception-Prediction-Reaction Agents for Deep Reinforcement Learning

URL: http://arxiv.org/abs/2006.15223v1
Date: Fri, 26 Jun 2020 21:53:47 GMT
Title: Perception-Prediction-Reaction Agents for Deep Reinforcement Learning
Authors: Adam Stooke, Valentin Dalibard, Siddhant M. Jayakumar, Wojciech M. Czarnecki, and Max Jaderberg
Abstract summary: We introduce a new recurrent agent architecture which improves reinforcement learning in tasks requiring long-term memory. A new auxiliary loss regularizes policies drawn from all three cores against each other, enacting the prior that the policy should be expressible from either recent or long-term memory.
Score: 12.566380944901816
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce a new recurrent agent architecture and associated auxiliary losses which improve reinforcement learning in partially observable tasks requiring long-term memory. We employ a temporal hierarchy, using a slow-ticking recurrent core to allow information to flow more easily over long time spans, and three fast-ticking recurrent cores with connections designed to create an information asymmetry. The \emph{reaction} core incorporates new observations with input from the slow core to produce the agent's policy; the \emph{perception} core accesses only short-term observations and informs the slow core; lastly, the \emph{prediction} core accesses only long-term memory. An auxiliary loss regularizes policies drawn from all three cores against each other, enacting the prior that the policy should be expressible from either recent or long-term memory. We present the resulting \emph{Perception-Prediction-Reaction} (PPR) agent and demonstrate its improved performance over a strong LSTM-agent baseline in DMLab-30, particularly in tasks requiring long-term memory. We further show significant improvements in Capture the Flag, an environment requiring agents to acquire a complicated mixture of skills over long time scales. In a series of ablation experiments, we probe the importance of each component of the PPR agent, establishing that the entire, novel combination is necessary for this intriguing result.

Related papers

HERMES: temporal-coHERent long-forM understanding with Episodes and Semantics [32.117677036812836]
HERMES is a model that simulates episodic memory accumulation to capture action sequences. Episodic COmpressor efficiently aggregates crucial representations from micro to semi-macro levels. Semantic ReTRiever dramatically reduces feature dimensionality while preserving relevant macro-level information.
arXiv Detail & Related papers (2024-08-30T17:52:55Z)
SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios. In the early route, intermediate outputs are consolidated via an anti-redundancy operation. In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z)
Improving Out-of-Distribution Generalization of Neural Rerankers with Contextualized Late Interaction [52.63663547523033]
Late interaction, the simplest form of multi-vector, is also helpful to neural rerankers that only use the [] vector to compute the similarity score. We show that the finding is consistent across different model sizes and first-stage retrievers of diverse natures.
arXiv Detail & Related papers (2023-02-13T18:42:17Z)
Generalization, Mayhems and Limits in Recurrent Proximal Policy Optimization [1.8570591025615453]
We highlight vital details that one must get right when adding recurrence to achieve a correct and efficient implementation. We explore the limitations of recurrent PPO by the benchmarking contributed novel environments Mortar Mayhem and Searing Spotlights. Remarkably, we can demonstrate a transition to strong generalization in Mortar Mayhem when scaling the number of training seeds.
arXiv Detail & Related papers (2022-05-23T07:54:15Z)
Recurrence-in-Recurrence Networks for Video Deblurring [58.49075799159015]
State-of-the-art video deblurring methods often adopt recurrent neural networks to model the temporal dependency between the frames. In this paper, we propose recurrence-in-recurrence network architecture to cope with the limitations of short-ranged memory.
arXiv Detail & Related papers (2022-03-12T11:58:13Z)
Towards mental time travel: a hierarchical memory for reinforcement learning agents [9.808027857786781]
Reinforcement learning agents often forget details of the past, especially after delays or distractor tasks. We propose a Hierarchical Transformer Memory (HTM) which helps agents to remember the past in detail. Agents with HTM can extrapolate to task sequences an order of magnitude longer than they were trained on, and can even generalize zero-shot from a meta-learning setting to maintaining knowledge across episodes.
arXiv Detail & Related papers (2021-05-28T18:12:28Z)
Temporal Memory Relation Network for Workflow Recognition from Surgical Video [53.20825496640025]
We propose a novel end-to-end temporal memory relation network (TMNet) for relating long-range and multi-scale temporal patterns. We have extensively validated our approach on two benchmark surgical video datasets.
arXiv Detail & Related papers (2021-03-30T13:20:26Z)
Dynamic Embeddings for Interaction Prediction [2.5758502140236024]
In recommender systems (RSs), predicting the next item that a user interacts with is critical for user retention. Recent studies have shown the effectiveness of modeling the mutual interactions between users and items using separate user and item embeddings. We propose a novel method called DeePRed that addresses some of their limitations.
arXiv Detail & Related papers (2020-11-10T16:04:46Z)
Untangling tradeoffs between recurrence and self-attention in neural networks [81.30894993852813]
We present a formal analysis of how self-attention affects gradient propagation in recurrent networks. We prove that it mitigates the problem of vanishing gradients when trying to capture long-term dependencies. We propose a relevancy screening mechanism that allows for a scalable use of sparse self-attention with recurrence.
arXiv Detail & Related papers (2020-06-16T19:24:25Z)
Sequential Recommender via Time-aware Attentive Memory Network [67.26862011527986]
We propose a temporal gating methodology to improve attention mechanism and recurrent units. We also propose a Multi-hop Time-aware Attentive Memory network to integrate long-term and short-term preferences. Our approach is scalable for candidate retrieval tasks and can be viewed as a non-linear generalization of latent factorization for dot-product based Top-K recommendation.
arXiv Detail & Related papers (2020-05-18T11:29:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.