Related papers: Reinforcement Learning with Symbolic Reward Machines

Reinforcement Learning with Symbolic Reward Machines

URL: http://arxiv.org/abs/2603.03068v1
Date: Tue, 03 Mar 2026 15:11:41 GMT
Title: Reinforcement Learning with Symbolic Reward Machines
Authors: Thomas Krug, Daniel Neider,
Abstract summary: We propose Symbolic Reward Machines (SRMs) together with the learning algorithms QSRM and LSRM to overcome the limitations of RMs.<n>SRMs consume only the standard output of the environment and process the observation directly through guards that are represented by symbolic formulas.<n>Our methods adhere to the widely used environment definition and provide interpretable representations of the task to the user.
Score: 3.945491622255908
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reward Machines (RMs) are an established mechanism in Reinforcement Learning (RL) to represent and learn sparse, temporally extended tasks with non-Markovian rewards. RMs rely on high-level information in the form of labels that are emitted by the environment alongside the observation. However, this concept requires manual user input for each environment and task. The user has to create a suitable labeling function that computes the labels. These limitations lead to poor applicability in widely adopted RL frameworks. We propose Symbolic Reward Machines (SRMs) together with the learning algorithms QSRM and LSRM to overcome the limitations of RMs. SRMs consume only the standard output of the environment and process the observation directly through guards that are represented by symbolic formulas. In our evaluation, our SRM methods outperform the baseline RL approaches and generate the same results as the existing RM methods. At the same time, our methods adhere to the widely used environment definition and provide interpretable representations of the task to the user.

Related papers

SoliReward: Mitigating Susceptibility to Reward Hacking and Annotation Noise in Video Generation Reward Models [53.19726629537694]
Post-training alignment of video generation models with human preferences is a critical goal.<n>Current data collection paradigms, reliant on in-prompt pairwise annotations, suffer from labeling noise.<n>We propose SoliReward, a systematic framework for video RM training.
arXiv Detail & Related papers (2025-12-17T14:28:23Z)
Your Reward Function for RL is Your Best PRM for Search: Unifying RL and Search-Based TTS [62.22644307952087]
We introduce AIRL-S, the first natural unification of RL-based and search-based TTS.<n>We leverage adversarial inverse reinforcement learning (AIRL) combined with group relative policy optimization (GRPO) to learn a dense, dynamic PRM directly from correct reasoning traces.<n>Our results show that our unified approach improves performance by 9 % on average over the base model, matching GPT-4o.
arXiv Detail & Related papers (2025-08-19T23:41:15Z)
Pushdown Reward Machines for Reinforcement Learning [17.63980224819404]
We present pushdown reward machines (pdRMs), an extension of reward machines based on deterministic pushdown automata.<n>pdRMs can recognize and reward temporally extended behaviours representable in deterministic context-free languages.<n>We show how agents can be trained to perform tasks representable in deterministic context-free languages using pdRMs.
arXiv Detail & Related papers (2025-08-09T08:59:09Z)
RewardAnything: Generalizable Principle-Following Reward Models [82.16312590749052]
Reward models are typically trained on fixed preference datasets.<n>This prevents adaptation to diverse real-world needs-from conciseness in one task to detailed explanations in another.<n>We introduce generalizable, principle-following reward models.<n>We present RewardAnything, a novel RM designed and trained to explicitly follow natural language principles.
arXiv Detail & Related papers (2025-06-04T07:30:16Z)
CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments [90.29937153770835]
We introduce CRMArena, a benchmark designed to evaluate AI agents on realistic tasks grounded in professional work environments.<n>We show that state-of-the-art LLM agents succeed in less than 40% of the tasks with ReAct prompting, and less than 55% even with function-calling abilities.<n>Our findings highlight the need for enhanced agent capabilities in function-calling and rule-following to be deployed in real-world work environments.
arXiv Detail & Related papers (2024-11-04T17:30:51Z)
Learning Robust Reward Machines from Noisy Labels [46.18428376996514]
PROB-IRM is an approach that learns robust reward machines (RMs) for reinforcement learning (RL) agents from noisy execution traces.<n>We show that PROB-IRM can learn (potentially imperfect) RMs from noisy traces and exploit them to train an RL agent to solve its tasks successfully.
arXiv Detail & Related papers (2024-08-27T08:41:42Z)
Neural Reward Machines [2.0755366440393743]
Non-markovian Reinforcement Learning (RL) tasks are very hard to solve, because agents must consider the entire history of state-action pairs to act rationally in the environment. We define Neural Reward Machines (NRM), an automata-based neurosymbolic framework that can be used for both reasoning and learning in non-symbolic RL domains. We show that NRMs can exploit high-level symbolic knowledge in non-symbolic environments without any knowledge of the SG function, outperforming Deep RL methods which cannot incorporate prior knowledge.
arXiv Detail & Related papers (2024-08-16T11:44:27Z)
MetaRM: Shifted Distributions Alignment via Meta-Learning [52.94381279744458]
Reinforcement Learning from Human Feedback (RLHF) in language model alignment is critically dependent on the capability of the reward model (RM) We introduce MetaRM, a method leveraging meta-learning to align the RM with the shifted environment distribution. Extensive experiments demonstrate that MetaRM significantly improves the RM's distinguishing ability in iterative RLHF optimization.
arXiv Detail & Related papers (2024-05-01T10:43:55Z)
Logical Specifications-guided Dynamic Task Sampling for Reinforcement Learning Agents [9.529492371336286]
Reinforcement Learning (RL) has made significant strides in enabling artificial agents to learn diverse behaviors. We propose a novel approach, called Logical Specifications-guided Dynamic Task Sampling (LSTS) LSTS learns a set of RL policies to guide an agent from an initial state to a goal state based on a high-level task specification.
arXiv Detail & Related papers (2024-02-06T04:00:21Z)
Hierarchies of Reward Machines [75.55324974788475]
Reward machines (RMs) are a recent formalism for representing the reward function of a reinforcement learning task through a finite-state machine. We propose a formalism for further abstracting the subtask structure by endowing an RM with the ability to call other RMs.
arXiv Detail & Related papers (2022-05-31T12:39:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.