Related papers: LCRL: Certified Policy Synthesis via Logically-Constrained Reinforcement Learning

LCRL: Certified Policy Synthesis via Logically-Constrained Reinforcement Learning

URL: http://arxiv.org/abs/2209.10341v1
Date: Wed, 21 Sep 2022 13:21:00 GMT
Title: LCRL: Certified Policy Synthesis via Logically-Constrained Reinforcement Learning
Authors: Hosein Hasanbeig and Daniel Kroening and Alessandro Abate
Abstract summary: LCRL implements model-free Reinforcement Learning (RL) algorithms over unknown Decision Processes (MDPs) We present case studies to demonstrate the applicability, ease of use, scalability, and performance of LCRL.
Score: 78.2286146954051
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: LCRL is a software tool that implements model-free Reinforcement Learning (RL) algorithms over unknown Markov Decision Processes (MDPs), synthesising policies that satisfy a given linear temporal specification with maximal probability. LCRL leverages partially deterministic finite-state machines known as Limit Deterministic Buchi Automata (LDBA) to express a given linear temporal specification. A reward function for the RL algorithm is shaped on-the-fly, based on the structure of the LDBA. Theoretical guarantees under proper assumptions ensure the convergence of the RL algorithm to an optimal policy that maximises the satisfaction probability. We present case studies to demonstrate the applicability, ease of use, scalability, and performance of LCRL. Owing to the LDBA-guided exploration and LCRL model-free architecture, we observe robust performance, which also scales well when compared to standard RL approaches (whenever applicable to LTL specifications). Full instructions on how to execute all the case studies in this paper are provided on a GitHub page that accompanies the LCRL distribution www.github.com/grockious/lcrl.

Related papers

$Q\sharp$: Provably Optimal Distributional RL for LLM Post-Training [60.01594991938747]
$Qsharp$ is a value-based algorithm for KL-regularized RL that guides the reference policy using the optimal regularized $Q$ function. Our results highlight $Qsharp$ as an effective approach for post-training LLMs, offering both improved performance and theoretical guarantees.
arXiv Detail & Related papers (2025-02-27T21:43:00Z)
REBEL: Reinforcement Learning via Regressing Relative Rewards [59.68420022466047]
We propose REBEL, a minimalist RL algorithm for the era of generative models. In theory, we prove that fundamental RL algorithms like Natural Policy Gradient can be seen as variants of REBEL. We find that REBEL provides a unified approach to language modeling and image generation with stronger or similar performance as PPO and DPO.
arXiv Detail & Related papers (2024-04-25T17:20:45Z)
How Can LLM Guide RL? A Value-Based Approach [68.55316627400683]
Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback. Recent developments in large language models (LLMs) have showcased impressive capabilities in language understanding and generation, yet they fall short in exploration and self-improvement capabilities. We develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning.
arXiv Detail & Related papers (2024-02-25T20:07:13Z)
SRL: Scaling Distributed Reinforcement Learning to Over Ten Thousand Cores [13.948640763797776]
We present a novel abstraction on the dataflows of RL training, which unifies diverse RL training applications into a general framework. We develop a scalable, efficient, and distributed RL system called ReaLly scalableRL, which allows efficient and massively parallelized training. SRL is the first in the academic community to perform RL experiments at a large scale with over 15k CPU cores.
arXiv Detail & Related papers (2023-06-29T05:16:25Z)
Policy Gradient for Reinforcement Learning with General Utilities [50.65940899590487]
In Reinforcement Learning (RL), the goal of agents is to discover an optimal policy that maximizes the expected cumulative rewards. Many supervised and unsupervised RL problems are not covered in the Linear RL framework. We derive the policy gradient theorem for RL with general utilities.
arXiv Detail & Related papers (2022-10-03T14:57:46Z)
ShinRL: A Library for Evaluating RL Algorithms from Theoretical and Practical Perspectives [11.675763847424786]
We present ShinRL, an open-source library for evaluation of reinforcement learning (RL) algorithms. ShinRL provides an RL environment interface that can compute metrics for delving into the behaviors of RL algorithms. We show how combining these two features of ShinRL makes it easier to analyze the behavior of deep Q learning.
arXiv Detail & Related papers (2021-12-08T05:34:46Z)
Towards Standardizing Reinforcement Learning Approaches for Stochastic Production Scheduling [77.34726150561087]
reinforcement learning can be used to solve scheduling problems. Existing studies rely on (sometimes) complex simulations for which the code is unavailable. There is a vast array of RL designs to choose from. standardization of model descriptions - both production setup and RL design - and validation scheme are a prerequisite.
arXiv Detail & Related papers (2021-04-16T16:07:10Z)
Reinforcement Learning Based Temporal Logic Control with Maximum Probabilistic Satisfaction [5.337302350000984]
This paper presents a model-free reinforcement learning algorithm to synthesize a control policy. The effectiveness of the RL-based control synthesis is demonstrated via simulation and experimental results.
arXiv Detail & Related papers (2020-10-14T03:49:16Z)
Certified Reinforcement Learning with Logic Guidance [78.2286146954051]
We propose a model-free RL algorithm that enables the use of Linear Temporal Logic (LTL) to formulate a goal for unknown continuous-state/action Markov Decision Processes (MDPs) The algorithm is guaranteed to synthesise a control policy whose traces satisfy the specification with maximal probability.
arXiv Detail & Related papers (2019-02-02T20:09:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.