Towards Data-Driven Offline Simulations for Online Reinforcement
Learning
- URL: http://arxiv.org/abs/2211.07614v1
- Date: Mon, 14 Nov 2022 18:36:13 GMT
- Title: Towards Data-Driven Offline Simulations for Online Reinforcement
Learning
- Authors: Shengpu Tang, Felipe Vieira Frujeri, Dipendra Misra, Alex Lamb, John
Langford, Paul Mineiro, Sebastian Kochman
- Abstract summary: We formalize offline learner simulation (OLS) for reinforcement learning (RL)
We propose a novel evaluation protocol that measures both fidelity and efficiency of the simulation.
- Score: 30.654163861164864
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern decision-making systems, from robots to web recommendation engines,
are expected to adapt: to user preferences, changing circumstances or even new
tasks. Yet, it is still uncommon to deploy a dynamically learning agent (rather
than a fixed policy) to a production system, as it's perceived as unsafe. Using
historical data to reason about learning algorithms, similar to offline policy
evaluation (OPE) applied to fixed policies, could help practitioners evaluate
and ultimately deploy such adaptive agents to production. In this work, we
formalize offline learner simulation (OLS) for reinforcement learning (RL) and
propose a novel evaluation protocol that measures both fidelity and efficiency
of the simulation. For environments with complex high-dimensional observations,
we propose a semi-parametric approach that leverages recent advances in latent
state discovery in order to achieve accurate and efficient offline simulations.
In preliminary experiments, we show the advantage of our approach compared to
fully non-parametric baselines. The code to reproduce these experiments will be
made available at https://github.com/microsoft/rl-offline-simulation.
Related papers
- COSBO: Conservative Offline Simulation-Based Policy Optimization [7.696359453385686]
offline reinforcement learning allows training reinforcement learning models on data from live deployments.
In contrast, simulation environments attempting to replicate the live environment can be used instead of the live data.
We propose a method that combines an imperfect simulation environment with data from the target environment, to train an offline reinforcement learning policy.
arXiv Detail & Related papers (2024-09-22T12:20:55Z) - Autonomous Vehicle Controllers From End-to-End Differentiable Simulation [60.05963742334746]
We propose a differentiable simulator and design an analytic policy gradients (APG) approach to training AV controllers.
Our proposed framework brings the differentiable simulator into an end-to-end training loop, where gradients of environment dynamics serve as a useful prior to help the agent learn a more grounded policy.
We find significant improvements in performance and robustness to noise in the dynamics, as well as overall more intuitive human-like handling.
arXiv Detail & Related papers (2024-09-12T11:50:06Z) - MOTO: Offline Pre-training to Online Fine-tuning for Model-based Robot
Learning [52.101643259906915]
We study the problem of offline pre-training and online fine-tuning for reinforcement learning from high-dimensional observations.
Existing model-based offline RL methods are not suitable for offline-to-online fine-tuning in high-dimensional domains.
We propose an on-policy model-based method that can efficiently reuse prior data through model-based value expansion and policy regularization.
arXiv Detail & Related papers (2024-01-06T21:04:31Z) - Offline Robot Reinforcement Learning with Uncertainty-Guided Human
Expert Sampling [11.751910133386254]
Recent advances in batch (offline) reinforcement learning have shown promising results in learning from available offline data.
We propose a novel approach that uses uncertainty estimation to trigger the injection of human demonstration data.
Our experiments show that this approach is more sample efficient when compared to a naive way of combining expert data with data collected from a sub-optimal agent.
arXiv Detail & Related papers (2022-12-16T01:41:59Z) - Online vs. Offline Adaptive Domain Randomization Benchmark [20.69035879843824]
We present an open benchmark for both offline and online methods (SimOpt, BayRn, DROID, DROPO) to shed light on which are most suitable for each setting and task at hand.
We found that online methods are limited by the quality of the currently learned policy for the next iteration, while offline methods may sometimes fail when replaying trajectories in simulation with open-loop commands.
arXiv Detail & Related papers (2022-06-29T14:03:53Z) - Curriculum Offline Imitation Learning [72.1015201041391]
offline reinforcement learning tasks require the agent to learn from a pre-collected dataset with no further interactions with the environment.
We propose textitCurriculum Offline Learning (COIL), which utilizes an experience picking strategy for imitating from adaptive neighboring policies with a higher return.
On continuous control benchmarks, we compare COIL against both imitation-based and RL-based methods, showing that it not only avoids just learning a mediocre behavior on mixed datasets but is also even competitive with state-of-the-art offline RL methods.
arXiv Detail & Related papers (2021-11-03T08:02:48Z) - A Workflow for Offline Model-Free Robotic Reinforcement Learning [117.07743713715291]
offline reinforcement learning (RL) enables learning control policies by utilizing only prior experience, without any online interaction.
We develop a practical workflow for using offline RL analogous to the relatively well-understood for supervised learning problems.
We demonstrate the efficacy of this workflow in producing effective policies without any online tuning.
arXiv Detail & Related papers (2021-09-22T16:03:29Z) - COMBO: Conservative Offline Model-Based Policy Optimization [120.55713363569845]
Uncertainty estimation with complex models, such as deep neural networks, can be difficult and unreliable.
We develop a new model-based offline RL algorithm, COMBO, that regularizes the value function on out-of-support state-actions.
We find that COMBO consistently performs as well or better as compared to prior offline model-free and model-based methods.
arXiv Detail & Related papers (2021-02-16T18:50:32Z) - Do Offline Metrics Predict Online Performance in Recommender Systems? [79.48653445643865]
We investigate the extent to which offline metrics predict online performance by evaluating recommenders across six simulated environments.
We observe that offline metrics are correlated with online performance over a range of environments.
We study the impact of adding exploration strategies, and observe that their effectiveness, when compared to greedy recommendation, is highly dependent on the recommendation algorithm.
arXiv Detail & Related papers (2020-11-07T01:41:13Z) - Deployment-Efficient Reinforcement Learning via Model-Based Offline
Optimization [46.017212565714175]
We propose a novel concept of deployment efficiency, measuring the number of distinct data-collection policies that are used during policy learning.
We propose a novel model-based algorithm, Behavior-Regularized Model-ENsemble (BREMEN) that can effectively optimize a policy offline using 10-20 times fewer data than prior works.
arXiv Detail & Related papers (2020-06-05T19:33:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.