Related papers: ShinRL: A Library for Evaluating RL Algorithms from Theoretical and Practical Perspectives

ShinRL: A Library for Evaluating RL Algorithms from Theoretical and Practical Perspectives

URL: http://arxiv.org/abs/2112.04123v1
Date: Wed, 8 Dec 2021 05:34:46 GMT
Title: ShinRL: A Library for Evaluating RL Algorithms from Theoretical and Practical Perspectives
Authors: Toshinori Kitamura, Ryo Yonetani
Abstract summary: We present ShinRL, an open-source library for evaluation of reinforcement learning (RL) algorithms. ShinRL provides an RL environment interface that can compute metrics for delving into the behaviors of RL algorithms. We show how combining these two features of ShinRL makes it easier to analyze the behavior of deep Q learning.
Score: 11.675763847424786
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present ShinRL, an open-source library specialized for the evaluation of reinforcement learning (RL) algorithms from both theoretical and practical perspectives. Existing RL libraries typically allow users to evaluate practical performances of deep RL algorithms through returns. Nevertheless, these libraries are not necessarily useful for analyzing if the algorithms perform as theoretically expected, such as if Q learning really achieves the optimal Q function. In contrast, ShinRL provides an RL environment interface that can compute metrics for delving into the behaviors of RL algorithms, such as the gap between learned and the optimal Q values and state visitation frequencies. In addition, we introduce a flexible solver interface for evaluating both theoretically justified algorithms (e.g., dynamic programming and tabular RL) and practically effective ones (i.e., deep RL, typically with some additional extensions and regularizations) in a consistent fashion. As a case study, we show that how combining these two features of ShinRL makes it easier to analyze the behavior of deep Q learning. Furthermore, we demonstrate that ShinRL can be used to empirically validate recent theoretical findings such as the effect of KL regularization for value iteration and for deep Q learning, and the robustness of entropy-regularized policies to adversarial rewards. The source code for ShinRL is available on GitHub: https://github.com/omron-sinicx/ShinRL.

Related papers

CleanQRL: Lightweight Single-file Implementations of Quantum Reinforcement Learning Algorithms [2.536162003546062]
CleanQRL is a library that offers single-script implementations of many Quantum Reinforcement Learning algorithms.<n>Our library provides clear and easy to understand scripts that researchers can quickly adapt to their own needs.
arXiv Detail & Related papers (2025-07-10T09:53:39Z)
Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving [27.551399472250168]
Large Language Models (LLMs) often struggle with mathematical reasoning tasks requiring precise, verifiable computation.<n>While Reinforcement Learning (RL) from outcome-based rewards enhances text-based reasoning, understanding how agents autonomously learn to leverage external tools like code execution remains crucial.
arXiv Detail & Related papers (2025-05-12T17:23:34Z)
Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With Verifiers [57.95157497749428]
We propose RL$V$ that augments any value-free'' RL method by jointly training the LLM as both a reasoner and a generative verifier.<n> RL$V$ boosts MATH accuracy by over 20% with parallel sampling and enables $8-32times$ efficient test-time compute scaling.
arXiv Detail & Related papers (2025-05-07T22:41:26Z)
Is Value Learning Really the Main Bottleneck in Offline RL? [70.54708989409409]
We show that the choice of a policy extraction algorithm significantly affects the performance and scalability of offline RL. We propose two simple test-time policy improvement methods and show that these methods lead to better performance.
arXiv Detail & Related papers (2024-06-13T17:07:49Z)
How Can LLM Guide RL? A Value-Based Approach [68.55316627400683]
Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback. Recent developments in large language models (LLMs) have showcased impressive capabilities in language understanding and generation, yet they fall short in exploration and self-improvement capabilities. We develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning.
arXiv Detail & Related papers (2024-02-25T20:07:13Z)
Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning [41.971465819626005]
We present Open RL Benchmark, a set of fully tracked RL experiments. Open RL Benchmark is community-driven: anyone can download, use, and contribute to the data. Special care is taken to ensure that each experiment is precisely reproducible.
arXiv Detail & Related papers (2024-02-05T14:32:00Z)
The Effective Horizon Explains Deep RL Performance in Stochastic Environments [21.148001945560075]
Reinforcement learning (RL) theory has largely focused on proving mini complexity sample bounds. We introduce a new RL algorithm, SQIRL, that iteratively learns a nearoptimal policy by exploring randomly to collect rollouts. We leverage SQIRL to derive instance-dependent sample complexity bounds for RL that are exponential only in an "effective horizon" look-ahead and on the complexity of the class used for approximation.
arXiv Detail & Related papers (2023-12-13T18:58:56Z)
SRL: Scaling Distributed Reinforcement Learning to Over Ten Thousand Cores [13.948640763797776]
We present a novel abstraction on the dataflows of RL training, which unifies diverse RL training applications into a general framework. We develop a scalable, efficient, and distributed RL system called ReaLly scalableRL, which allows efficient and massively parallelized training. SRL is the first in the academic community to perform RL experiments at a large scale with over 15k CPU cores.
arXiv Detail & Related papers (2023-06-29T05:16:25Z)
RL$^3$: Boosting Meta Reinforcement Learning via RL inside RL$^2$ [12.111848705677142]
We propose RL$3$, a hybrid approach that incorporates action-values, learned per task through traditional RL, in the inputs to meta-RL. We show that RL$3$ earns greater cumulative reward in the long term, compared to RL$2$, while maintaining data-efficiency in the short term, and generalizes better to out-of-distribution tasks.
arXiv Detail & Related papers (2023-06-28T04:16:16Z)
LCRL: Certified Policy Synthesis via Logically-Constrained Reinforcement Learning [78.2286146954051]
LCRL implements model-free Reinforcement Learning (RL) algorithms over unknown Decision Processes (MDPs) We present case studies to demonstrate the applicability, ease of use, scalability, and performance of LCRL.
arXiv Detail & Related papers (2022-09-21T13:21:00Z)
Contrastive Learning as Goal-Conditioned Reinforcement Learning [147.28638631734486]
In reinforcement learning (RL), it is easier to solve a task if given a good representation. While deep RL should automatically acquire such good representations, prior work often finds that learning representations in an end-to-end fashion is unstable. We show (contrastive) representation learning methods can be cast as RL algorithms in their own right.
arXiv Detail & Related papers (2022-06-15T14:34:15Z)
Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy. In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks. We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z)
RL-DARTS: Differentiable Architecture Search for Reinforcement Learning [62.95469460505922]
We introduce RL-DARTS, one of the first applications of Differentiable Architecture Search (DARTS) in reinforcement learning (RL) By replacing the image encoder with a DARTS supernet, our search method is sample-efficient, requires minimal extra compute resources, and is also compatible with off-policy and on-policy RL algorithms, needing only minor changes in preexisting code. We show that the supernet gradually learns better cells, leading to alternative architectures which can be highly competitive against manually designed policies, but also verify previous design choices for RL policies.
arXiv Detail & Related papers (2021-06-04T03:08:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.