Related papers: Reactive Exploration to Cope with Non-Stationarity in Lifelong Reinforcement Learning

Reactive Exploration to Cope with Non-Stationarity in Lifelong Reinforcement Learning

URL: http://arxiv.org/abs/2207.05742v1
Date: Tue, 12 Jul 2022 17:59:00 GMT
Title: Reactive Exploration to Cope with Non-Stationarity in Lifelong Reinforcement Learning
Authors: Christian Steinparz, Thomas Schmied, Fabian Paischer, Marius-Constantin Dinu, Vihang Patil, Angela Bitto-Nemling, Hamid Eghbal-zadeh, Sepp Hochreiter
Abstract summary: We propose Reactive Exploration to track and react to continual domain shifts in lifelong reinforcement learning. We empirically show that representatives of the policy-gradient family are better suited for lifelong learning, as they adapt more quickly to distribution shifts than Q-learning.
Score: 4.489095027077955
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In lifelong learning, an agent learns throughout its entire life without resets, in a constantly changing environment, as we humans do. Consequently, lifelong learning comes with a plethora of research problems such as continual domain shifts, which result in non-stationary rewards and environment dynamics. These non-stationarities are difficult to detect and cope with due to their continuous nature. Therefore, exploration strategies and learning methods are required that are capable of tracking the steady domain shifts, and adapting to them. We propose Reactive Exploration to track and react to continual domain shifts in lifelong reinforcement learning, and to update the policy correspondingly. To this end, we conduct experiments in order to investigate different exploration strategies. We empirically show that representatives of the policy-gradient family are better suited for lifelong learning, as they adapt more quickly to distribution shifts than Q-learning. Thereby, policy-gradient methods profit the most from Reactive Exploration and show good results in lifelong learning with continual domain shifts. Our code is available at: https://github.com/ml-jku/reactive-exploration.

Related papers

Lifelong Continual Learning for Anomaly Detection: New Challenges, Perspectives, and Insights [3.654287752011122]
Lifelong anomaly detection provides intrinsically different challenges compared to the more widely explored classification setting. First, we explain why lifelong anomaly detection is relevant, defining challenges and opportunities to design anomaly detection methods that deal with lifelong learning complexities. Third, we perform experiments with popular anomaly detection methods on proposed lifelong scenarios, emphasizing the gap in performance that could be gained with the adoption of lifelong learning.
arXiv Detail & Related papers (2023-03-14T00:49:09Z)
Loss of Plasticity in Continual Deep Reinforcement Learning [14.475963928766134]
We demonstrate that deep RL agents lose their ability to learn good policies when they cycle through a sequence of Atari 2600 games. We investigate this phenomenon closely at scale and analyze how the weights, gradients, and activations change over time. Our analysis shows that the activation footprint of the network becomes sparser, contributing to the diminishing gradients.
arXiv Detail & Related papers (2023-03-13T22:37:15Z)
Long-Term Exploration in Persistent MDPs [68.8204255655161]
We propose an exploration method called Rollback-Explore (RbExplore) In this paper, we propose an exploration method called Rollback-Explore (RbExplore), which utilizes the concept of the persistent Markov decision process. We test our algorithm in the hard-exploration Prince of Persia game, without rewards and domain knowledge.
arXiv Detail & Related papers (2021-09-21T13:47:04Z)
Simplifying Deep Reinforcement Learning via Self-Supervision [51.2400839966489]
Self-Supervised Reinforcement Learning (SSRL) is a simple algorithm that optimize policies with purely supervised losses. We show that SSRL is surprisingly competitive to contemporary algorithms with more stable performance and less running time.
arXiv Detail & Related papers (2021-06-10T06:29:59Z)
Lifelong Policy Gradient Learning of Factored Policies for Faster Training Without Forgetting [26.13332231423652]
We provide a novel method for lifelong policy gradient learning that trains lifelong function approximators directly via policy gradients. We show empirically that our algorithm learns faster and converges to better policies than single-task and lifelong learning baselines.
arXiv Detail & Related papers (2020-07-14T13:05:42Z)
Ecological Reinforcement Learning [76.9893572776141]
We study the kinds of environment properties that can make learning under such conditions easier. understanding how properties of the environment impact the performance of reinforcement learning agents can help us to structure our tasks in ways that make learning tractable.
arXiv Detail & Related papers (2020-06-22T17:55:03Z)
Deep Reinforcement Learning amidst Lifelong Non-Stationarity [67.24635298387624]
We show that an off-policy RL algorithm can reason about and tackle lifelong non-stationarity. Our method leverages latent variable models to learn a representation of the environment from current and past experiences. We also introduce several simulation environments that exhibit lifelong non-stationarity, and empirically find that our approach substantially outperforms approaches that do not reason about environment shift.
arXiv Detail & Related papers (2020-06-18T17:34:50Z)
Learning Adaptive Exploration Strategies in Dynamic Environments Through Informed Policy Regularization [100.72335252255989]
We study the problem of learning exploration-exploitation strategies that effectively adapt to dynamic environments. We propose a novel algorithm that regularizes the training of an RNN-based policy using informed policies trained to maximize the reward in each task.
arXiv Detail & Related papers (2020-05-06T16:14:48Z)
Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey [53.73359052511171]
Reinforcement learning (RL) is a popular paradigm for addressing sequential decision tasks in which the agent has only limited environmental feedback. We present a framework for curriculum learning (CL) in RL, and use it to survey and classify existing CL methods in terms of their assumptions, capabilities, and goals.
arXiv Detail & Related papers (2020-03-10T20:41:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.