Online vs. Offline Adaptive Domain Randomization Benchmark
- URL: http://arxiv.org/abs/2206.14661v1
- Date: Wed, 29 Jun 2022 14:03:53 GMT
- Title: Online vs. Offline Adaptive Domain Randomization Benchmark
- Authors: Gabriele Tiboni, Karol Arndt, Giuseppe Averta, Ville Kyrki, Tatiana
Tommasi
- Abstract summary: We present an open benchmark for both offline and online methods (SimOpt, BayRn, DROID, DROPO) to shed light on which are most suitable for each setting and task at hand.
We found that online methods are limited by the quality of the currently learned policy for the next iteration, while offline methods may sometimes fail when replaying trajectories in simulation with open-loop commands.
- Score: 20.69035879843824
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Physics simulators have shown great promise for conveniently learning
reinforcement learning policies in safe, unconstrained environments. However,
transferring the acquired knowledge to the real world can be challenging due to
the reality gap. To this end, several methods have been recently proposed to
automatically tune simulator parameters with posterior distributions given real
data, for use with domain randomization at training time. These approaches have
been shown to work for various robotic tasks under different settings and
assumptions. Nevertheless, existing literature lacks a thorough comparison of
existing adaptive domain randomization methods with respect to transfer
performance and real-data efficiency. In this work, we present an open
benchmark for both offline and online methods (SimOpt, BayRn, DROID, DROPO), to
shed light on which are most suitable for each setting and task at hand. We
found that online methods are limited by the quality of the currently learned
policy for the next iteration, while offline methods may sometimes fail when
replaying trajectories in simulation with open-loop commands. The code used
will be released at https://github.com/gabrieletiboni/adr-benchmark.
Related papers
- Scalable Offline Reinforcement Learning for Mean Field Games [6.8267158622784745]
Off-MMD is a novel mean-field RL algorithm that approximates equilibrium policies in mean-field games using purely offline data.
Our algorithm scales to complex environments and demonstrates strong performance on benchmark tasks like crowd exploration or navigation.
arXiv Detail & Related papers (2024-10-23T14:16:34Z) - Autonomous Vehicle Controllers From End-to-End Differentiable Simulation [60.05963742334746]
We propose a differentiable simulator and design an analytic policy gradients (APG) approach to training AV controllers.
Our proposed framework brings the differentiable simulator into an end-to-end training loop, where gradients of environment dynamics serve as a useful prior to help the agent learn a more grounded policy.
We find significant improvements in performance and robustness to noise in the dynamics, as well as overall more intuitive human-like handling.
arXiv Detail & Related papers (2024-09-12T11:50:06Z) - Understanding the performance gap between online and offline alignment algorithms [63.137832242488926]
We show that offline algorithms train policy to become good at pairwise classification, while online algorithms are good at generations.
This hints at a unique interplay between discriminative and generative capabilities, which is greatly impacted by the sampling process.
Our study sheds light on the pivotal role of on-policy sampling in AI alignment, and hints at certain fundamental challenges of offline alignment algorithms.
arXiv Detail & Related papers (2024-05-14T09:12:30Z) - Improving Offline Reinforcement Learning with Inaccurate Simulators [34.54402525918925]
We propose a novel approach to combine the offline dataset and the inaccurate simulation data in a better manner.
Specifically, we pre-train a generative adversarial network (GAN) model to fit the state distribution of the offline dataset.
Our experimental results in the D4RL benchmark and a real-world manipulation task confirm that our method can benefit more from both inaccurate simulator and limited offline datasets to achieve better performance than the state-of-the-art methods.
arXiv Detail & Related papers (2024-05-07T13:29:41Z) - Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - Robust Visual Sim-to-Real Transfer for Robotic Manipulation [79.66851068682779]
Learning visuomotor policies in simulation is much safer and cheaper than in the real world.
However, due to discrepancies between the simulated and real data, simulator-trained policies often fail when transferred to real robots.
One common approach to bridge the visual sim-to-real domain gap is domain randomization (DR)
arXiv Detail & Related papers (2023-07-28T05:47:24Z) - Towards Data-Driven Offline Simulations for Online Reinforcement
Learning [30.654163861164864]
We formalize offline learner simulation (OLS) for reinforcement learning (RL)
We propose a novel evaluation protocol that measures both fidelity and efficiency of the simulation.
arXiv Detail & Related papers (2022-11-14T18:36:13Z) - Do Offline Metrics Predict Online Performance in Recommender Systems? [79.48653445643865]
We investigate the extent to which offline metrics predict online performance by evaluating recommenders across six simulated environments.
We observe that offline metrics are correlated with online performance over a range of environments.
We study the impact of adding exploration strategies, and observe that their effectiveness, when compared to greedy recommendation, is highly dependent on the recommendation algorithm.
arXiv Detail & Related papers (2020-11-07T01:41:13Z) - Tracking Performance of Online Stochastic Learners [57.14673504239551]
Online algorithms are popular in large-scale learning settings due to their ability to compute updates on the fly, without the need to store and process data in large batches.
When a constant step-size is used, these algorithms also have the ability to adapt to drifts in problem parameters, such as data or model properties, and track the optimal solution with reasonable accuracy.
We establish a link between steady-state performance derived under stationarity assumptions and the tracking performance of online learners under random walk models.
arXiv Detail & Related papers (2020-04-04T14:16:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.