Efficient Deep Reinforcement Learning with Predictive Processing
Proximal Policy Optimization
- URL: http://arxiv.org/abs/2211.06236v2
- Date: Mon, 29 Jan 2024 14:17:03 GMT
- Title: Efficient Deep Reinforcement Learning with Predictive Processing
Proximal Policy Optimization
- Authors: Burcu K\"u\c{c}\"uko\u{g}lu, Walraaf Borkent, Bodo Rueckauer, Nasir
Ahmad, Umut G\"u\c{c}l\"u and Marcel van Gerven
- Abstract summary: We show that recurrent neural networks which predict their own sensory states can be leveraged to minimise surprise.
We present the Predictive Processing Proximal Policy Optimization (P4O) agent.
It applies predictive processing to a recurrent variant of the PPO algorithm by integrating a world model in its hidden state.
- Score: 3.0217238755526057
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Advances in reinforcement learning (RL) often rely on massive compute
resources and remain notoriously sample inefficient. In contrast, the human
brain is able to efficiently learn effective control strategies using limited
resources. This raises the question whether insights from neuroscience can be
used to improve current RL methods. Predictive processing is a popular
theoretical framework which maintains that the human brain is actively seeking
to minimize surprise. We show that recurrent neural networks which predict
their own sensory states can be leveraged to minimise surprise, yielding
substantial gains in cumulative reward. Specifically, we present the Predictive
Processing Proximal Policy Optimization (P4O) agent; an actor-critic
reinforcement learning agent that applies predictive processing to a recurrent
variant of the PPO algorithm by integrating a world model in its hidden state.
Even without hyperparameter tuning, P4O significantly outperforms a baseline
recurrent variant of the PPO algorithm on multiple Atari games using a single
GPU. It also outperforms other state-of-the-art agents given the same
wall-clock time and exceeds human gamer performance on multiple games including
Seaquest, which is a particularly challenging environment in the Atari domain.
Altogether, our work underscores how insights from the field of neuroscience
may support the development of more capable and efficient artificial agents.
Related papers
- Perception-Aware Policy Optimization for Multimodal Reasoning [79.56070395437898]
A major source of error in current multimodal reasoning lies in the perception of visual inputs.<n>We propose PAPO, a novel policy gradient algorithm that encourages the model to learn to perceive while learning to reason.<n>We observe a substantial reduction of 30.5% in perception errors, indicating improved perceptual capabilities with PAPO.
arXiv Detail & Related papers (2025-07-08T23:22:34Z) - SHIRE: Enhancing Sample Efficiency using Human Intuition in REinforcement Learning [11.304750795377657]
We propose SHIRE, a framework for encoding human intuition using Probabilistic Graphical Models (PGMs)
SHIRE achieves 25-78% sample efficiency gains across the environments we evaluate at negligible overhead cost.
arXiv Detail & Related papers (2024-09-16T04:46:22Z) - Efficient Reinforcement Learning via Decoupling Exploration and Utilization [6.305976803910899]
Reinforcement Learning (RL) has achieved remarkable success across multiple fields and applications, including gaming, robotics, and autonomous vehicles.
In this work, our aim is to train agent with efficient learning by decoupling exploration and utilization, so that agent can escaping the conundrum of suboptimal Solutions.
The above idea is implemented in the proposed OPARL (Optimistic and Pessimistic Actor Reinforcement Learning) algorithm.
arXiv Detail & Related papers (2023-12-26T09:03:23Z) - REBEL: Reward Regularization-Based Approach for Robotic Reinforcement Learning from Human Feedback [61.54791065013767]
A misalignment between the reward function and human preferences can lead to catastrophic outcomes in the real world.
Recent methods aim to mitigate misalignment by learning reward functions from human preferences.
We propose a novel concept of reward regularization within the robotic RLHF framework.
arXiv Detail & Related papers (2023-12-22T04:56:37Z) - Flexible Attention-Based Multi-Policy Fusion for Efficient Deep
Reinforcement Learning [78.31888150539258]
Reinforcement learning (RL) agents have long sought to approach the efficiency of human learning.
Prior studies in RL have incorporated external knowledge policies to help agents improve sample efficiency.
We present Knowledge-Grounded RL (KGRL), an RL paradigm fusing multiple knowledge policies and aiming for human-like efficiency and flexibility.
arXiv Detail & Related papers (2022-10-07T17:56:57Z) - Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment.
We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent.
We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z) - Retrieval-Augmented Reinforcement Learning [63.32076191982944]
We train a network to map a dataset of past experiences to optimal behavior.
The retrieval process is trained to retrieve information from the dataset that may be useful in the current context.
We show that retrieval-augmented R2D2 learns significantly faster than the baseline R2D2 agent and achieves higher scores.
arXiv Detail & Related papers (2022-02-17T02:44:05Z) - RAPID-RL: A Reconfigurable Architecture with Preemptive-Exits for
Efficient Deep-Reinforcement Learning [7.990007201671364]
We propose a reconfigurable architecture with preemptive exits for efficient deep RL (RAPID-RL)
RAPID-RL enables conditional activation of preemptive layers based on the difficulty level of inputs.
We show that RAPID-RL incurs 0.34x (0.25x) number of operations (OPS) while maintaining performance above 0.88x (0.91x) on Atari (Drone navigation) tasks.
arXiv Detail & Related papers (2021-09-16T21:30:40Z) - Explore and Control with Adversarial Surprise [78.41972292110967]
Reinforcement learning (RL) provides a framework for learning goal-directed policies given user-specified rewards.
We propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences.
We show that our method leads to the emergence of complex skills by exhibiting clear phase transitions.
arXiv Detail & Related papers (2021-07-12T17:58:40Z) - An Efficient Application of Neuroevolution for Competitive Multiagent
Learning [0.0]
NEAT is a popular evolutionary strategy used to obtain the best performing neural network architecture.
This paper utilizes the NEAT algorithm to achieve competitive multiagent learning on a modified pong game environment.
arXiv Detail & Related papers (2021-05-23T10:34:48Z) - Improving Computational Efficiency in Visual Reinforcement Learning via
Stored Embeddings [89.63764845984076]
We present Stored Embeddings for Efficient Reinforcement Learning (SEER)
SEER is a simple modification of existing off-policy deep reinforcement learning methods.
We show that SEER does not degrade the performance of RLizable agents while significantly saving computation and memory.
arXiv Detail & Related papers (2021-03-04T08:14:10Z) - Learning Dexterous Manipulation from Suboptimal Experts [69.8017067648129]
Relative Entropy Q-Learning (REQ) is a simple policy algorithm that combines ideas from successful offline and conventional RL algorithms.
We show how REQ is also effective for general off-policy RL, offline RL, and RL from demonstrations.
arXiv Detail & Related papers (2020-10-16T18:48:49Z) - Using Generative Adversarial Nets on Atari Games for Feature Extraction
in Deep Reinforcement Learning [0.76146285961466]
Deep Reinforcement Learning (DRL) has been successfully applied in several research domains such as robot navigation and automated video game playing.
The main reason for this requirement is that sparse and delayed rewards do not provide an effective supervision for representation learning of deep neural networks.
In this study, Proximal Policy Optimization (PPO) algorithm is augmented with Generative Adrial Networks (GANs) to increase the sample efficiency by enforcing the network to learn efficient representations without depending on sparse and delayed rewards as supervision.
arXiv Detail & Related papers (2020-04-06T15:46:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.