Related papers: Learning Robust Reward Machines from Noisy Labels

Learning Robust Reward Machines from Noisy Labels

URL: http://arxiv.org/abs/2408.14871v1
Date: Tue, 27 Aug 2024 08:41:42 GMT
Title: Learning Robust Reward Machines from Noisy Labels
Authors: Roko Parac, Lorenzo Nodari, Leo Ardon, Daniel Furelos-Blanco, Federico Cerutti, Alessandra Russo,
Abstract summary: PROB-IRM is an approach that learns robust reward machines (RMs) for reinforcement learning (RL) agents from noisy execution traces. We show that PROB-IRM can learn (potentially imperfect) RMs from noisy traces and exploit them to train an RL agent to solve its tasks successfully.
Score: 46.18428376996514
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper presents PROB-IRM, an approach that learns robust reward machines (RMs) for reinforcement learning (RL) agents from noisy execution traces. The key aspect of RM-driven RL is the exploitation of a finite-state machine that decomposes the agent's task into different subtasks. PROB-IRM uses a state-of-the-art inductive logic programming framework robust to noisy examples to learn RMs from noisy traces using the Bayesian posterior degree of beliefs, thus ensuring robustness against inconsistencies. Pivotal for the results is the interleaving between RM learning and policy learning: a new RM is learned whenever the RL agent generates a trace that is believed not to be accepted by the current RM. To speed up the training of the RL agent, PROB-IRM employs a probabilistic formulation of reward shaping that uses the posterior Bayesian beliefs derived from the traces. Our experimental analysis shows that PROB-IRM can learn (potentially imperfect) RMs from noisy traces and exploit them to train an RL agent to solve its tasks successfully. Despite the complexity of learning the RM from noisy traces, agents trained with PROB-IRM perform comparably to agents provided with handcrafted RMs.

Related papers

Off-Policy Corrected Reward Modeling for Reinforcement Learning from Human Feedback [52.1410307583181]
We useReinforcement Learning from Human Feedback to train language models (LMs) to follow complex human preferences.<n>As training progresses, the responses generated by the LM no longer resemble the responses seen by the reward model (RM)<n>We propose Off-Policy Corrected Reward Modeling to correct the RM using importance weighting, without requiring new labels or samples.
arXiv Detail & Related papers (2025-07-21T11:19:04Z)
Reward Machine Inference for Robotic Manipulation [1.6135226672466307]
Reward Machines (RMs) enhance RL's capability to train policies over extended time horizons. We introduce a novel LfD approach for learning RMs directly from visual demonstrations of robotic manipulation tasks. We validate our method on vision-based manipulation tasks, showing that the inferred RM accurately captures task structure and enables an RL agent to effectively learn an optimal policy.
arXiv Detail & Related papers (2024-12-13T12:32:53Z)
Robot See, Robot Do: Imitation Reward for Noisy Financial Environments [0.0]
This paper introduces a novel and more robust reward function by leveraging imitation learning. We integrate imitation (expert's) feedback with reinforcement (agent's) feedback in a model-free reinforcement learning algorithm. Empirical results demonstrate that this novel approach improves financial performance metrics compared to traditional benchmarks.
arXiv Detail & Related papers (2024-11-13T14:24:47Z)
RRM: Robust Reward Model Training Mitigates Reward Hacking [51.12341734942797]
Reward models (RMs) play a pivotal role in aligning large language models with human preferences. We introduce a causal framework that learns preferences independent of these artifacts. Experiments show that our approach successfully filters out undesirable artifacts, yielding a more robust reward model.
arXiv Detail & Related papers (2024-09-20T01:46:07Z)
Knowledge-Informed Auto-Penetration Testing Based on Reinforcement Learning with Reward Machine [2.087814874079289]
We propose a knowledge-informed AutoPT framework called DRLRM-PT. We use reward machines (RMs) to encode domain knowledge as guidelines for training a PT policy. We show that RMs encoding more detailed domain knowledge demonstrated better PT performance compared to RMs with simpler knowledge.
arXiv Detail & Related papers (2024-05-24T20:05:12Z)
Multi-Agent Reinforcement Learning with a Hierarchy of Reward Machines [5.600971575680638]
We study the cooperative Multi-Agent Reinforcement Learning (MARL) problems using Reward Machines (RMs) We present Multi-Agent Reinforcement Learning with a hierarchy of RMs (MAHRM) that is capable of dealing with more complex scenarios. Experimental results in three cooperative MARL domains show that MAHRM outperforms other MARL methods using the same prior knowledge of high-level events.
arXiv Detail & Related papers (2024-03-08T06:38:22Z)
The Trickle-down Impact of Reward (In-)consistency on RLHF [71.37987812944971]
We show that reward inconsistency exhibits a trickle-down effect on the downstream Reinforcement Learning from Human Feedback process. We propose Contrast Instructions -- a benchmarking strategy for the consistency of RM. We show that RLHF models trained with a more consistent RM yield more useful responses.
arXiv Detail & Related papers (2023-09-28T04:05:13Z)
From Cloze to Comprehension: Retrofitting Pre-trained Masked Language Model to Pre-trained Machine Reader [130.45769668885487]
Pre-trained Machine Reader (PMR) is a novel method for retrofitting masked language models (MLMs) to pre-trained machine reading comprehension (MRC) models without acquiring labeled data. To build the proposed PMR, we constructed a large volume of general-purpose and high-quality MRC-style training data. PMR has the potential to serve as a unified model for tackling various extraction and classification tasks in the MRC formulation.
arXiv Detail & Related papers (2022-12-09T10:21:56Z)
Hierarchies of Reward Machines [75.55324974788475]
Reward machines (RMs) are a recent formalism for representing the reward function of a reinforcement learning task through a finite-state machine. We propose a formalism for further abstracting the subtask structure by endowing an RM with the ability to call other RMs.
arXiv Detail & Related papers (2022-05-31T12:39:24Z)
Non-Markovian Reward Modelling from Trajectory Labels via Interpretable Multiple Instance Learning [10.724516317292924]
We show how RM can be approached as a multiple instance learning (MIL) problem. We develop new MIL models that are able to capture the time dependencies in labelled trajectories. We demonstrate on a range of RL tasks that our novel MIL models can reconstruct reward functions to a high level of accuracy.
arXiv Detail & Related papers (2022-05-30T18:20:22Z)
What is Going on Inside Recurrent Meta Reinforcement Learning Agents? [63.58053355357644]
Recurrent meta reinforcement learning (meta-RL) agents are agents that employ a recurrent neural network (RNN) for the purpose of "learning a learning algorithm" We shed light on the internal working mechanisms of these agents by reformulating the meta-RL problem using the Partially Observable Markov Decision Process (POMDP) framework.
arXiv Detail & Related papers (2021-04-29T20:34:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.