Reinforcement Learning with Stochastic Reward Machines
- URL: http://arxiv.org/abs/2510.14837v1
- Date: Thu, 16 Oct 2025 16:12:04 GMT
- Title: Reinforcement Learning with Stochastic Reward Machines
- Authors: Jan Corazza, Ivan Gavran, Daniel Neider,
- Abstract summary: We introduce a novel type of reward machines, called reward machines, and an algorithm for learning them.<n>Our algorithm, based on constraint solving, learns minimal reward machines from the explorations of a reinforcement learning agent.
- Score: 5.345748208068876
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reward machines are an established tool for dealing with reinforcement learning problems in which rewards are sparse and depend on complex sequences of actions. However, existing algorithms for learning reward machines assume an overly idealized setting where rewards have to be free of noise. To overcome this practical limitation, we introduce a novel type of reward machines, called stochastic reward machines, and an algorithm for learning them. Our algorithm, based on constraint solving, learns minimal stochastic reward machines from the explorations of a reinforcement learning agent. This algorithm can easily be paired with existing reinforcement learning algorithms for reward machines and guarantees to converge to an optimal policy in the limit. We demonstrate the effectiveness of our algorithm in two case studies and show that it outperforms both existing methods and a naive approach for handling noisy reward functions.
Related papers
- Provably Efficient Exploration in Reward Machines with Low Regret [20.076030507802553]
We study reinforcement learning for decision processes with non-Markovian reward.<n>Our main algorithmic contribution is a model-based RL algorithm for decision processes involving probabilistic reward machines.<n>We derive high-probability and non-asymptotic bounds on its regret and demonstrate the gain in terms of regret over existing algorithms.
arXiv Detail & Related papers (2024-12-26T12:25:04Z) - Maximally Permissive Reward Machines [8.425937972214667]
We propose a new approach to synthesising reward machines based on the set of partial order plans for a goal.
We prove that learning using such "maximally permissive" reward machines results in higher rewards than learning using RMs based on a single plan.
arXiv Detail & Related papers (2024-08-15T09:59:26Z) - Sample Efficient Reinforcement Learning by Automatically Learning to
Compose Subtasks [3.1594865504808944]
We propose an RL algorithm that automatically structure the reward function for sample efficiency, given a set of labels that signify subtasks.
We evaluate our algorithm in a variety of sparse-reward environments.
arXiv Detail & Related papers (2024-01-25T15:06:40Z) - REBEL: Reward Regularization-Based Approach for Robotic Reinforcement Learning from Human Feedback [61.54791065013767]
A misalignment between the reward function and human preferences can lead to catastrophic outcomes in the real world.<n>Recent methods aim to mitigate misalignment by learning reward functions from human preferences.<n>We propose a novel concept of reward regularization within the robotic RLHF framework.
arXiv Detail & Related papers (2023-12-22T04:56:37Z) - STARC: A General Framework For Quantifying Differences Between Reward Functions [52.69620361363209]
We provide a class of pseudometrics on the space of all reward functions that we call STARC metrics.<n>We show that STARC metrics induce both an upper and a lower bound on worst-case regret.<n>We also identify a number of issues with reward metrics proposed by earlier works.
arXiv Detail & Related papers (2023-09-26T20:31:19Z) - Automata Learning from Preference and Equivalence Queries [17.33092604696224]
We propose a novel variant of the active automata learning problem: actively learn finite automata using preference queries.<n>ReMAP is guaranteed to correctly infer the minimal complexity with query complexity under exact equivalence queries.<n>Our empirical evaluations indicate REMAP scales to large automata is effective at learning correct automata from consistent teachers.
arXiv Detail & Related papers (2023-08-18T04:49:45Z) - Anti-Concentrated Confidence Bonuses for Scalable Exploration [57.91943847134011]
Intrinsic rewards play a central role in handling the exploration-exploitation trade-off.
We introduce emphanti-concentrated confidence bounds for efficiently approximating the elliptical bonus.
We develop a practical variant for deep reinforcement learning that is competitive with contemporary intrinsic rewards on Atari benchmarks.
arXiv Detail & Related papers (2021-10-21T15:25:15Z) - MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven
Reinforcement Learning [65.52675802289775]
We show that an uncertainty aware classifier can solve challenging reinforcement learning problems.
We propose a novel method for computing the normalized maximum likelihood (NML) distribution.
We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions.
arXiv Detail & Related papers (2021-07-15T08:19:57Z) - Minimax Optimization with Smooth Algorithmic Adversaries [59.47122537182611]
We propose a new algorithm for the min-player against smooth algorithms deployed by an adversary.
Our algorithm is guaranteed to make monotonic progress having no limit cycles, and to find an appropriate number of gradient ascents.
arXiv Detail & Related papers (2021-06-02T22:03:36Z) - Emergent Real-World Robotic Skills via Unsupervised Off-Policy
Reinforcement Learning [81.12201426668894]
We develop efficient reinforcement learning methods that acquire diverse skills without any reward function, and then repurpose these skills for downstream tasks.
We show that our proposed algorithm provides substantial improvement in learning efficiency, making reward-free real-world training feasible.
We also demonstrate that the learned skills can be composed using model predictive control for goal-oriented navigation, without any additional training.
arXiv Detail & Related papers (2020-04-27T17:38:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.