Related papers: Reliable validation of Reinforcement Learning Benchmarks

Reliable validation of Reinforcement Learning Benchmarks

URL: http://arxiv.org/abs/2203.01075v1
Date: Wed, 2 Mar 2022 12:55:27 GMT
Title: Reliable validation of Reinforcement Learning Benchmarks
Authors: Matthias M\"uller-Brockhausen, Aske Plaat, Mike Preuss
Abstract summary: Reinforcement Learning (RL) is one of the most dynamic research areas in Game AI and AI as a whole. There are numerous benchmark environments whose scores are used to compare different algorithms, such as Atari. We propose improving this situation by providing access to the original experimental data to validate study results.
Score: 1.2031796234206134
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Reinforcement Learning (RL) is one of the most dynamic research areas in Game AI and AI as a whole, and a wide variety of games are used as its prominent test problems. However, it is subject to the replicability crisis that currently affects most algorithmic AI research. Benchmarking in Reinforcement Learning could be improved through verifiable results. There are numerous benchmark environments whose scores are used to compare different algorithms, such as Atari. Nevertheless, reviewers must trust that figures represent truthful values, as it is difficult to reproduce an exact training curve. We propose improving this situation by providing access to the original experimental data to validate study results. To that end, we rely on the concept of minimal traces. These allow re-simulation of action sequences in deterministic RL environments and, in turn, enable reviewers to verify, re-use, and manually inspect experimental results without needing large compute clusters. It also permits validation of presented reward graphs, an inspection of individual episodes, and re-use of result data (baselines) for proper comparison in follow-up papers. We offer plug-and-play code that works with Gym so that our measures fit well in the existing RL and reproducibility eco-system. Our approach is freely available, easy to use, and adds minimal overhead, as minimal traces allow a data compression ratio of up to $\approx 10^4:1$ (94GB to 8MB for Atari Pong) compared to a regular MDP trace used in offline RL datasets. The paper presents proof-of-concept results for a variety of games.

Related papers

Reinforcement Learning for Dynamic Resource Allocation in Optical Networks: Hype or Hope? [39.78423267310698]
The application of reinforcement learning to dynamic resource allocation in optical networks has been the focus of intense research activity in recent years. We present a review of progress in the field, and identify significant gaps in benchmarking practices and solutions.
arXiv Detail & Related papers (2025-02-18T12:09:42Z)
Scaling Test-Time Compute Without Verification or RL is Suboptimal [70.28430200655919]
We show that finetuning LLMs with verifier-based (VB) methods based on RL or search is far superior to verifier-free (VF) approaches based on distilling or cloning search traces, given a fixed amount of compute/data budget. We corroborate our theory empirically on both didactic and math reasoning problems with 3/8B-sized pre-trained LLMs, where we find verification is crucial for scaling test-time compute.
arXiv Detail & Related papers (2025-02-17T18:43:24Z)
Adaptive Data Exploitation in Deep Reinforcement Learning [50.53705050673944]
We introduce ADEPT, a powerful framework to enhance the **data efficiency** and **generalization** in deep reinforcement learning (RL) Specifically, ADEPT adaptively manages the use of sampled data across different learning stages via multi-armed bandit (MAB) algorithms. We test ADEPT on benchmarks including Procgen, MiniGrid, and PyBullet.
arXiv Detail & Related papers (2025-01-22T04:01:17Z)
Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration [54.8229698058649]
We study how unlabeled prior trajectory data can be leveraged to learn efficient exploration strategies. Our method SUPE (Skills from Unlabeled Prior data for Exploration) demonstrates that a careful combination of these ideas compounds their benefits. We empirically show that SUPE reliably outperforms prior strategies, successfully solving a suite of long-horizon, sparse-reward tasks.
arXiv Detail & Related papers (2024-10-23T17:58:45Z)
Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning [41.971465819626005]
We present Open RL Benchmark, a set of fully tracked RL experiments. Open RL Benchmark is community-driven: anyone can download, use, and contribute to the data. Special care is taken to ensure that each experiment is precisely reproducible.
arXiv Detail & Related papers (2024-02-05T14:32:00Z)
SMaRt: Improving GANs with Score Matching Regularity [94.81046452865583]
Generative adversarial networks (GANs) usually struggle in learning from highly diverse data, whose underlying manifold is complex. We show that score matching serves as a promising solution to this issue thanks to its capability of persistently pushing the generated data points towards the real data manifold. We propose to improve the optimization of GANs with score matching regularity (SMaRt)
arXiv Detail & Related papers (2023-11-30T03:05:14Z)
Is Inverse Reinforcement Learning Harder than Standard Reinforcement Learning? A Theoretical Perspective [55.36819597141271]
Inverse Reinforcement Learning (IRL) -- the problem of learning reward functions from demonstrations of an emphexpert policy -- plays a critical role in developing intelligent systems. This paper provides the first line of efficient IRL in vanilla offline and online settings using samples and runtime. As an application, we show that the learned rewards can emphtransfer to another target MDP with suitable guarantees.
arXiv Detail & Related papers (2023-11-29T00:09:01Z)
Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data. We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z)
Offline Equilibrium Finding [40.08360411502593]
We aim to generalize Offline RL to a multi-agent or multiplayer-game setting. Very little research has been done in this area, as the progress is hindered by the lack of standardized datasets and meaningful benchmarks. Our two model-based algorithms -- OEF-PSRO and OEF-CFR -- are adaptations of the widely-used equilibrium finding algorithms Deep CFR and PSRO in the context of offline learning.
arXiv Detail & Related papers (2022-07-12T03:41:06Z)
Weakly Supervised Scene Text Detection using Deep Reinforcement Learning [6.918282834668529]
We propose a weak supervision method for scene text detection, which makes use of reinforcement learning (RL) The reward received by the RL agent is estimated by a neural network, instead of being inferred from ground-truth labels. We then use our proposed system in a weakly- and semi-supervised training on real-world data.
arXiv Detail & Related papers (2022-01-13T10:15:42Z)
Interpretable performance analysis towards offline reinforcement learning: A dataset perspective [6.526790418943535]
We propose a two-fold taxonomy for existing offline RL algorithms. We explore the correlation between the performance of different types of algorithms and the distribution of actions under states. We create a benchmark platform on the Atari domain, entitled easy go (RLEG), at an estimated cost of more than 0.3 million dollars.
arXiv Detail & Related papers (2021-05-12T07:17:06Z)
RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning [108.9599280270704]
We propose a benchmark called RL Unplugged to evaluate and compare offline RL methods. RL Unplugged includes data from a diverse range of domains including games and simulated motor control problems. We will release data for all our tasks and open-source all algorithms presented in this paper.
arXiv Detail & Related papers (2020-06-24T17:14:51Z)
Provably Efficient Causal Reinforcement Learning with Confounded Observational Data [135.64775986546505]
We study how to incorporate the dataset (observational data) collected offline, which is often abundantly available in practice, to improve the sample efficiency in the online setting. We propose the deconfounded optimistic value iteration (DOVI) algorithm, which incorporates the confounded observational data in a provably efficient manner.
arXiv Detail & Related papers (2020-06-22T14:49:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.