Reliable validation of Reinforcement Learning Benchmarks
- URL: http://arxiv.org/abs/2203.01075v1
- Date: Wed, 2 Mar 2022 12:55:27 GMT
- Title: Reliable validation of Reinforcement Learning Benchmarks
- Authors: Matthias M\"uller-Brockhausen, Aske Plaat, Mike Preuss
- Abstract summary: Reinforcement Learning (RL) is one of the most dynamic research areas in Game AI and AI as a whole.
There are numerous benchmark environments whose scores are used to compare different algorithms, such as Atari.
We propose improving this situation by providing access to the original experimental data to validate study results.
- Score: 1.2031796234206134
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Reinforcement Learning (RL) is one of the most dynamic research areas in Game
AI and AI as a whole, and a wide variety of games are used as its prominent
test problems. However, it is subject to the replicability crisis that
currently affects most algorithmic AI research. Benchmarking in Reinforcement
Learning could be improved through verifiable results. There are numerous
benchmark environments whose scores are used to compare different algorithms,
such as Atari. Nevertheless, reviewers must trust that figures represent
truthful values, as it is difficult to reproduce an exact training curve. We
propose improving this situation by providing access to the original
experimental data to validate study results. To that end, we rely on the
concept of minimal traces. These allow re-simulation of action sequences in
deterministic RL environments and, in turn, enable reviewers to verify, re-use,
and manually inspect experimental results without needing large compute
clusters. It also permits validation of presented reward graphs, an inspection
of individual episodes, and re-use of result data (baselines) for proper
comparison in follow-up papers. We offer plug-and-play code that works with Gym
so that our measures fit well in the existing RL and reproducibility
eco-system. Our approach is freely available, easy to use, and adds minimal
overhead, as minimal traces allow a data compression ratio of up to $\approx
10^4:1$ (94GB to 8MB for Atari Pong) compared to a regular MDP trace used in
offline RL datasets. The paper presents proof-of-concept results for a variety
of games.
Related papers
- Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration [54.8229698058649]
We study how unlabeled prior trajectory data can be leveraged to learn efficient exploration strategies.
Our method SUPE (Skills from Unlabeled Prior data for Exploration) demonstrates that a careful combination of these ideas compounds their benefits.
We empirically show that SUPE reliably outperforms prior strategies, successfully solving a suite of long-horizon, sparse-reward tasks.
arXiv Detail & Related papers (2024-10-23T17:58:45Z) - Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement
Learning [41.971465819626005]
We present Open RL Benchmark, a set of fully tracked RL experiments.
Open RL Benchmark is community-driven: anyone can download, use, and contribute to the data.
Special care is taken to ensure that each experiment is precisely reproducible.
arXiv Detail & Related papers (2024-02-05T14:32:00Z) - SMaRt: Improving GANs with Score Matching Regularity [94.81046452865583]
Generative adversarial networks (GANs) usually struggle in learning from highly diverse data, whose underlying manifold is complex.
We show that score matching serves as a promising solution to this issue thanks to its capability of persistently pushing the generated data points towards the real data manifold.
We propose to improve the optimization of GANs with score matching regularity (SMaRt)
arXiv Detail & Related papers (2023-11-30T03:05:14Z) - Is Inverse Reinforcement Learning Harder than Standard Reinforcement
Learning? A Theoretical Perspective [55.36819597141271]
Inverse Reinforcement Learning (IRL) -- the problem of learning reward functions from demonstrations of an emphexpert policy -- plays a critical role in developing intelligent systems.
This paper provides the first line of efficient IRL in vanilla offline and online settings using samples and runtime.
As an application, we show that the learned rewards can emphtransfer to another target MDP with suitable guarantees.
arXiv Detail & Related papers (2023-11-29T00:09:01Z) - Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data.
We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z) - Offline Equilibrium Finding [40.08360411502593]
We aim to generalize Offline RL to a multi-agent or multiplayer-game setting.
Very little research has been done in this area, as the progress is hindered by the lack of standardized datasets and meaningful benchmarks.
Our two model-based algorithms -- OEF-PSRO and OEF-CFR -- are adaptations of the widely-used equilibrium finding algorithms Deep CFR and PSRO in the context of offline learning.
arXiv Detail & Related papers (2022-07-12T03:41:06Z) - Weakly Supervised Scene Text Detection using Deep Reinforcement Learning [6.918282834668529]
We propose a weak supervision method for scene text detection, which makes use of reinforcement learning (RL)
The reward received by the RL agent is estimated by a neural network, instead of being inferred from ground-truth labels.
We then use our proposed system in a weakly- and semi-supervised training on real-world data.
arXiv Detail & Related papers (2022-01-13T10:15:42Z) - Interpretable performance analysis towards offline reinforcement
learning: A dataset perspective [6.526790418943535]
We propose a two-fold taxonomy for existing offline RL algorithms.
We explore the correlation between the performance of different types of algorithms and the distribution of actions under states.
We create a benchmark platform on the Atari domain, entitled easy go (RLEG), at an estimated cost of more than 0.3 million dollars.
arXiv Detail & Related papers (2021-05-12T07:17:06Z) - RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning [108.9599280270704]
We propose a benchmark called RL Unplugged to evaluate and compare offline RL methods.
RL Unplugged includes data from a diverse range of domains including games and simulated motor control problems.
We will release data for all our tasks and open-source all algorithms presented in this paper.
arXiv Detail & Related papers (2020-06-24T17:14:51Z) - Provably Efficient Causal Reinforcement Learning with Confounded
Observational Data [135.64775986546505]
We study how to incorporate the dataset (observational data) collected offline, which is often abundantly available in practice, to improve the sample efficiency in the online setting.
We propose the deconfounded optimistic value iteration (DOVI) algorithm, which incorporates the confounded observational data in a provably efficient manner.
arXiv Detail & Related papers (2020-06-22T14:49:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.