Lipschitz Lifelong Reinforcement Learning
- URL: http://arxiv.org/abs/2001.05411v3
- Date: Mon, 22 Mar 2021 14:35:30 GMT
- Title: Lipschitz Lifelong Reinforcement Learning
- Authors: Erwan Lecarpentier, David Abel, Kavosh Asadi, Yuu Jinnai, Emmanuel
Rachelson, Michael L. Littman
- Abstract summary: We consider the problem of knowledge transfer when an agent is facing a series of Reinforcement Learning (RL) tasks.
We introduce a novel metric between Markov Decision Processes (MDPs) and establish that close MDPs have close optimal value functions.
These theoretical results lead us to a value-transfer method for Lifelong RL, which we use to build a PAC-MDP algorithm with improved convergence rate.
- Score: 40.36085483977208
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of knowledge transfer when an agent is facing a
series of Reinforcement Learning (RL) tasks. We introduce a novel metric
between Markov Decision Processes (MDPs) and establish that close MDPs have
close optimal value functions. Formally, the optimal value functions are
Lipschitz continuous with respect to the tasks space. These theoretical results
lead us to a value-transfer method for Lifelong RL, which we use to build a
PAC-MDP algorithm with improved convergence rate. Further, we show the method
to experience no negative transfer with high probability. We illustrate the
benefits of the method in Lifelong RL experiments.
Related papers
- Efficient Preference-based Reinforcement Learning via Aligned Experience Estimation [37.36913210031282]
Preference-based reinforcement learning (PbRL) has shown impressive capabilities in training agents without reward engineering.
We propose SEER, an efficient PbRL method that integrates label smoothing and policy regularization techniques.
arXiv Detail & Related papers (2024-05-29T01:49:20Z) - How Can LLM Guide RL? A Value-Based Approach [68.55316627400683]
Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback.
Recent developments in large language models (LLMs) have showcased impressive capabilities in language understanding and generation, yet they fall short in exploration and self-improvement capabilities.
We develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning.
arXiv Detail & Related papers (2024-02-25T20:07:13Z) - Provably Efficient CVaR RL in Low-rank MDPs [58.58570425202862]
We study risk-sensitive Reinforcement Learning (RL)
We propose a novel Upper Confidence Bound (UCB) bonus-driven algorithm to balance interplay between exploration, exploitation, and representation learning in CVaR RL.
We prove that our algorithm achieves a sample complexity of $epsilon$-optimal CVaR, where $H$ is the length of each episode, $A$ is the capacity of action space, and $d$ is the dimension of representations.
arXiv Detail & Related papers (2023-11-20T17:44:40Z) - Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte Carlo [104.9535542833054]
We present a scalable and effective exploration strategy based on Thompson sampling for reinforcement learning (RL)
We instead directly sample the Q function from its posterior distribution, by using Langevin Monte Carlo.
Our approach achieves better or similar results compared with state-of-the-art deep RL algorithms on several challenging exploration tasks from the Atari57 suite.
arXiv Detail & Related papers (2023-05-29T17:11:28Z) - On Practical Robust Reinforcement Learning: Practical Uncertainty Set
and Double-Agent Algorithm [11.748284119769039]
Robust reinforcement learning (RRL) aims at seeking a robust policy to optimize the worst case performance over an uncertainty set of Markov decision processes (MDPs)
arXiv Detail & Related papers (2023-05-11T08:52:09Z) - Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision
Processes [80.89852729380425]
We propose the first computationally efficient algorithm that achieves the nearly minimax optimal regret $tilde O(dsqrtH3K)$.
Our work provides a complete answer to optimal RL with linear MDPs, and the developed algorithm and theoretical tools may be of independent interest.
arXiv Detail & Related papers (2022-12-12T18:58:59Z) - Hypernetworks for Zero-shot Transfer in Reinforcement Learning [21.994654567458017]
Hypernetworks are trained to generate behaviors across a range of unseen task conditions.
This work relates to meta RL, contextual RL, and transfer learning.
Our method demonstrates significant improvements over baselines from multitask and meta RL approaches.
arXiv Detail & Related papers (2022-11-28T15:48:35Z) - Variance-Aware Off-Policy Evaluation with Linear Function Approximation [85.75516599931632]
We study the off-policy evaluation problem in reinforcement learning with linear function approximation.
We propose an algorithm, VA-OPE, which uses the estimated variance of the value function to reweight the Bellman residual in Fitted Q-Iteration.
arXiv Detail & Related papers (2021-06-22T17:58:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.