Continual Learning In Environments With Polynomial Mixing Times
- URL: http://arxiv.org/abs/2112.07066v1
- Date: Mon, 13 Dec 2021 23:41:56 GMT
- Title: Continual Learning In Environments With Polynomial Mixing Times
- Authors: Matthew Riemer, Sharath Chandra Raparthy, Ignacio Cases, Gopeshh
Subbaraj, Maximilian Puelma Touzel and Irina Rish
- Abstract summary: We study the effect of mixing times on learning in continual reinforcement learning.
We propose a family of model-based algorithms that speed up learning by directly optimizing for the average reward.
- Score: 13.533984338434106
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The mixing time of the Markov chain induced by a policy limits performance in
real-world continual learning scenarios. Yet, the effect of mixing times on
learning in continual reinforcement learning (RL) remains underexplored. In
this paper, we characterize problems that are of long-term interest to the
development of continual RL, which we call scalable MDPs, through the lens of
mixing times. In particular, we establish that scalable MDPs have mixing times
that scale polynomially with the size of the problem. We go on to demonstrate
that polynomial mixing times present significant difficulties for existing
approaches and propose a family of model-based algorithms that speed up
learning by directly optimizing for the average reward through a novel
bootstrapping procedure. Finally, we perform empirical regret analysis of our
proposed approaches, demonstrating clear improvements over baselines and also
how scalable MDPs can be used for analysis of RL algorithms as mixing times
scale.
Related papers
- Near-Optimal Learning and Planning in Separated Latent MDPs [70.88315649628251]
We study computational and statistical aspects of learning Latent Markov Decision Processes (LMDPs)
In this model, the learner interacts with an MDP drawn at the beginning of each epoch from an unknown mixture of MDPs.
arXiv Detail & Related papers (2024-06-12T06:41:47Z) - Towards Global Optimality for Practical Average Reward Reinforcement Learning without Mixing Time Oracles [83.85151306138007]
Multi-level Actor-Critic (MAC) framework incorporates a Multi-level Monte-Carlo (MLMC) estimator.
We demonstrate that MAC outperforms the existing state-of-the-art policy gradient-based method for average reward settings.
arXiv Detail & Related papers (2024-03-18T16:23:47Z) - Efficient Exploration in Continuous-time Model-based Reinforcement
Learning [37.14026153342745]
Reinforcement learning algorithms typically consider discrete-time dynamics, even though the underlying systems are often continuous in time.
We introduce a model-based reinforcement learning algorithm that represents continuous-time dynamics.
arXiv Detail & Related papers (2023-10-30T15:04:40Z) - A Multi-Scale Decomposition MLP-Mixer for Time Series Analysis [14.40202378972828]
We propose MSD-Mixer, a Multi-Scale Decomposition-Mixer, which learns to explicitly decompose and represent the input time series in its different layers.
We demonstrate that MSD-Mixer consistently and significantly outperforms other state-of-the-art algorithms with better efficiency.
arXiv Detail & Related papers (2023-10-18T13:39:07Z) - Beyond Exponentially Fast Mixing in Average-Reward Reinforcement
Learning via Multi-Level Monte Carlo Actor-Critic [61.968469104271676]
We propose an RL methodology attuned to the mixing time by employing a multi-level Monte Carlo estimator for the critic, the actor, and the average reward embedded within an actor-critic (AC) algorithm.
We experimentally show that these alleviated restrictions on the technical conditions required for stability translate to superior performance in practice for RL problems with sparse rewards.
arXiv Detail & Related papers (2023-01-28T04:12:56Z) - Imitating, Fast and Slow: Robust learning from demonstrations via
decision-time planning [96.72185761508668]
Planning at Test-time (IMPLANT) is a new meta-algorithm for imitation learning.
We demonstrate that IMPLANT significantly outperforms benchmark imitation learning approaches on standard control environments.
arXiv Detail & Related papers (2022-04-07T17:16:52Z) - A Kernel-Based Approach to Non-Stationary Reinforcement Learning in
Metric Spaces [53.47210316424326]
KeRNS is an algorithm for episodic reinforcement learning in non-stationary Markov Decision Processes.
We prove a regret bound that scales with the covering dimension of the state-action space and the total variation of the MDP with time.
arXiv Detail & Related papers (2020-07-09T21:37:13Z) - Robust Inverse Reinforcement Learning under Transition Dynamics Mismatch [60.23815709215807]
We study the inverse reinforcement learning (IRL) problem under a transition dynamics mismatch between the expert and the learner.
We propose a robust MCE IRL algorithm, which is a principled approach to help with this mismatch.
arXiv Detail & Related papers (2020-07-02T14:57:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.