Robust Reinforcement Learning using Offline Data
- URL: http://arxiv.org/abs/2208.05129v1
- Date: Wed, 10 Aug 2022 03:47:45 GMT
- Title: Robust Reinforcement Learning using Offline Data
- Authors: Kishan Panaganti, Zaiyan Xu, Dileep Kalathil, Mohammad Ghavamzadeh
- Abstract summary: We propose a robust reinforcement learning algorithm called Robust Fitted Q-Iteration (RFQI)
RFQI uses only an offline dataset to learn the optimal robust policy.
We prove that RFQI learns a near-optimal robust policy under standard assumptions.
- Score: 23.260211453437055
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: The goal of robust reinforcement learning (RL) is to learn a policy that is
robust against the uncertainty in model parameters. Parameter uncertainty
commonly occurs in many real-world RL applications due to simulator modeling
errors, changes in the real-world system dynamics over time, and adversarial
disturbances. Robust RL is typically formulated as a max-min problem, where the
objective is to learn the policy that maximizes the value against the worst
possible models that lie in an uncertainty set. In this work, we propose a
robust RL algorithm called Robust Fitted Q-Iteration (RFQI), which uses only an
offline dataset to learn the optimal robust policy. Robust RL with offline data
is significantly more challenging than its non-robust counterpart because of
the minimization over all models present in the robust Bellman operator. This
poses challenges in offline data collection, optimization over the models, and
unbiased estimation. In this work, we propose a systematic approach to overcome
these challenges, resulting in our RFQI algorithm. We prove that RFQI learns a
near-optimal robust policy under standard assumptions and demonstrate its
superior performance on standard benchmark problems.
Related papers
- Deep autoregressive density nets vs neural ensembles for model-based
offline reinforcement learning [2.9158689853305693]
We consider a model-based reinforcement learning algorithm that infers the system dynamics from the available data and performs policy optimization on imaginary model rollouts.
This approach is vulnerable to exploiting model errors which can lead to catastrophic failures on the real system.
We show that better performance can be obtained with a single well-calibrated autoregressive model on the D4RL benchmark.
arXiv Detail & Related papers (2024-02-05T10:18:15Z) - MOTO: Offline Pre-training to Online Fine-tuning for Model-based Robot
Learning [52.101643259906915]
We study the problem of offline pre-training and online fine-tuning for reinforcement learning from high-dimensional observations.
Existing model-based offline RL methods are not suitable for offline-to-online fine-tuning in high-dimensional domains.
We propose an on-policy model-based method that can efficiently reuse prior data through model-based value expansion and policy regularization.
arXiv Detail & Related papers (2024-01-06T21:04:31Z) - MICRO: Model-Based Offline Reinforcement Learning with a Conservative Bellman Operator [13.140242573639629]
offline reinforcement learning (RL) faces a significant challenge of distribution shift.
Model-free offline RL penalizes the Q value for out-of-distribution (OOD) data or constrains the policy closed to the behavior policy to tackle this problem.
This paper proposes a new model-based offline algorithm with a conservative Bellman operator (MICRO)
arXiv Detail & Related papers (2023-12-07T02:17:45Z) - RAMBO-RL: Robust Adversarial Model-Based Offline Reinforcement Learning [11.183124892686239]
We present Robust Adversarial Model-Based Offline RL (RAMBO), a novel approach to model-based offline RL.
To achieve conservatism, we formulate the problem as a two-player zero sum game against an adversarial environment model.
We evaluate our approach on widely studied offline RL benchmarks, and demonstrate that our approach achieves state of the art performance.
arXiv Detail & Related papers (2022-04-26T20:42:14Z) - Combining Pessimism with Optimism for Robust and Efficient Model-Based
Deep Reinforcement Learning [56.17667147101263]
In real-world tasks, reinforcement learning agents encounter situations that are not present during training time.
To ensure reliable performance, the RL agents need to exhibit robustness against worst-case situations.
We propose the Robust Hallucinated Upper-Confidence RL (RH-UCRL) algorithm to provably solve this problem.
arXiv Detail & Related papers (2021-03-18T16:50:17Z) - COMBO: Conservative Offline Model-Based Policy Optimization [120.55713363569845]
Uncertainty estimation with complex models, such as deep neural networks, can be difficult and unreliable.
We develop a new model-based offline RL algorithm, COMBO, that regularizes the value function on out-of-support state-actions.
We find that COMBO consistently performs as well or better as compared to prior offline model-free and model-based methods.
arXiv Detail & Related papers (2021-02-16T18:50:32Z) - Overcoming Model Bias for Robust Offline Deep Reinforcement Learning [3.1325640909772403]
MOOSE is an algorithm which ensures low model bias by keeping the policy within the support of the data.
We compare MOOSE with state-of-the-art model-free, offline RL algorithms BRAC, BEAR and BCQ on the Industrial Benchmark and MuJoCo continuous control tasks in terms of robust performance.
arXiv Detail & Related papers (2020-08-12T19:08:55Z) - Critic Regularized Regression [70.8487887738354]
We propose a novel offline RL algorithm to learn policies from data using a form of critic-regularized regression (CRR)
We find that CRR performs surprisingly well and scales to tasks with high-dimensional state and action spaces.
arXiv Detail & Related papers (2020-06-26T17:50:26Z) - MOPO: Model-based Offline Policy Optimization [183.6449600580806]
offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data.
We show that an existing model-based RL algorithm already produces significant gains in the offline setting.
We propose to modify the existing model-based RL methods by applying them with rewards artificially penalized by the uncertainty of the dynamics.
arXiv Detail & Related papers (2020-05-27T08:46:41Z) - Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot
Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO)
We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.