Deep Reinforcement Learning amidst Lifelong Non-Stationarity
- URL: http://arxiv.org/abs/2006.10701v1
- Date: Thu, 18 Jun 2020 17:34:50 GMT
- Title: Deep Reinforcement Learning amidst Lifelong Non-Stationarity
- Authors: Annie Xie, James Harrison, Chelsea Finn
- Abstract summary: We show that an off-policy RL algorithm can reason about and tackle lifelong non-stationarity.
Our method leverages latent variable models to learn a representation of the environment from current and past experiences.
We also introduce several simulation environments that exhibit lifelong non-stationarity, and empirically find that our approach substantially outperforms approaches that do not reason about environment shift.
- Score: 67.24635298387624
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As humans, our goals and our environment are persistently changing throughout
our lifetime based on our experiences, actions, and internal and external
drives. In contrast, typical reinforcement learning problem set-ups consider
decision processes that are stationary across episodes. Can we develop
reinforcement learning algorithms that can cope with the persistent change in
the former, more realistic problem settings? While on-policy algorithms such as
policy gradients in principle can be extended to non-stationary settings, the
same cannot be said for more efficient off-policy algorithms that replay past
experiences when learning. In this work, we formalize this problem setting, and
draw upon ideas from the online learning and probabilistic inference literature
to derive an off-policy RL algorithm that can reason about and tackle such
lifelong non-stationarity. Our method leverages latent variable models to learn
a representation of the environment from current and past experiences, and
performs off-policy RL with this representation. We further introduce several
simulation environments that exhibit lifelong non-stationarity, and empirically
find that our approach substantially outperforms approaches that do not reason
about environment shift.
Related papers
- No Regrets: Investigating and Improving Regret Approximations for Curriculum Discovery [53.08822154199948]
Unsupervised Environment Design (UED) methods have gained recent attention as their adaptive curricula promise to enable agents to be robust to in- and out-of-distribution tasks.
This work investigates how existing UED methods select training environments, focusing on task prioritisation metrics.
We develop a method that directly trains on scenarios with high learnability.
arXiv Detail & Related papers (2024-08-27T14:31:54Z) - OMPO: A Unified Framework for RL under Policy and Dynamics Shifts [42.57662196581823]
Training reinforcement learning policies using environment interaction data collected from varying policies or dynamics presents a fundamental challenge.
Existing works often overlook the distribution discrepancies induced by policy or dynamics shifts, or rely on specialized algorithms with task priors.
In this paper, we identify a unified strategy for online RL policy learning under diverse settings of policy and dynamics shifts: transition occupancy matching.
arXiv Detail & Related papers (2024-05-29T13:36:36Z) - Efficient Imitation Learning with Conservative World Models [54.52140201148341]
We tackle the problem of policy learning from expert demonstrations without a reward function.
We re-frame imitation learning as a fine-tuning problem, rather than a pure reinforcement learning one.
arXiv Detail & Related papers (2024-05-21T20:53:18Z) - Learning fast changing slow in spiking neural networks [3.069335774032178]
Reinforcement learning (RL) faces substantial challenges when applied to real-life problems.
Life-long learning machines must resolve the plasticity-stability paradox.
Striking a balance between acquiring new knowledge and maintaining stability is crucial for artificial agents.
arXiv Detail & Related papers (2024-01-25T12:03:10Z) - Adaptive Tracking of a Single-Rigid-Body Character in Various
Environments [2.048226951354646]
We propose a deep reinforcement learning method based on the simulation of a single-rigid-body character.
Using the centroidal dynamics model (CDM) to express the full-body character as a single rigid body (SRB) and training a policy to track a reference motion, we can obtain a policy capable of adapting to various unobserved environmental changes.
We demonstrate that our policy, efficiently trained within 30 minutes on an ultraportable laptop, has the ability to cope with environments that have not been experienced during learning.
arXiv Detail & Related papers (2023-08-14T22:58:54Z) - Learning Robust Policy against Disturbance in Transition Dynamics via
State-Conservative Policy Optimization [63.75188254377202]
Deep reinforcement learning algorithms can perform poorly in real-world tasks due to discrepancy between source and target environments.
We propose a novel model-free actor-critic algorithm to learn robust policies without modeling the disturbance in advance.
Experiments in several robot control tasks demonstrate that SCPO learns robust policies against the disturbance in transition dynamics.
arXiv Detail & Related papers (2021-12-20T13:13:05Z) - One Solution is Not All You Need: Few-Shot Extrapolation via Structured
MaxEnt RL [142.36621929739707]
We show that learning diverse behaviors for accomplishing a task can lead to behavior that generalizes to varying environments.
By identifying multiple solutions for the task in a single environment during training, our approach can generalize to new situations.
arXiv Detail & Related papers (2020-10-27T17:41:57Z) - Learning Adaptive Exploration Strategies in Dynamic Environments Through
Informed Policy Regularization [100.72335252255989]
We study the problem of learning exploration-exploitation strategies that effectively adapt to dynamic environments.
We propose a novel algorithm that regularizes the training of an RNN-based policy using informed policies trained to maximize the reward in each task.
arXiv Detail & Related papers (2020-05-06T16:14:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.