Non-Markovian Reward Modelling from Trajectory Labels via Interpretable
Multiple Instance Learning
- URL: http://arxiv.org/abs/2205.15367v1
- Date: Mon, 30 May 2022 18:20:22 GMT
- Title: Non-Markovian Reward Modelling from Trajectory Labels via Interpretable
Multiple Instance Learning
- Authors: Joseph Early, Tom Bewley, Christine Evers, Sarvapali Ramchurn
- Abstract summary: We show how RM can be approached as a multiple instance learning (MIL) problem.
We develop new MIL models that are able to capture the time dependencies in labelled trajectories.
We demonstrate on a range of RL tasks that our novel MIL models can reconstruct reward functions to a high level of accuracy.
- Score: 10.724516317292924
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We generalise the problem of reward modelling (RM) for reinforcement learning
(RL) to handle non-Markovian rewards. Existing work assumes that human
evaluators observe each step in a trajectory independently when providing
feedback on agent behaviour. In this work, we remove this assumption, extending
RM to include hidden state information that captures temporal dependencies in
human assessment of trajectories. We then show how RM can be approached as a
multiple instance learning (MIL) problem, and develop new MIL models that are
able to capture the time dependencies in labelled trajectories. We demonstrate
on a range of RL tasks that our novel MIL models can reconstruct reward
functions to a high level of accuracy, and that they provide interpretable
learnt hidden information that can be used to train high-performing agent
policies.
Related papers
- Reinforcement Learning from LLM Feedback to Counteract Goal
Misgeneralization [0.0]
We introduce a method to address goal misgeneralization in reinforcement learning (RL)
Goal misgeneralization occurs when an agent retains its capabilities out-of-distribution yet pursues a proxy rather than the intended one.
This study demonstrates how the Large Language Model can efficiently supervise RL agents.
arXiv Detail & Related papers (2024-01-14T01:09:48Z) - ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent [50.508669199496474]
We develop a ReAct-style LLM agent with the ability to reason and act upon external knowledge.
We refine the agent through a ReST-like method that iteratively trains on previous trajectories.
Starting from a prompted large model and after just two iterations of the algorithm, we can produce a fine-tuned small model.
arXiv Detail & Related papers (2023-12-15T18:20:15Z) - Diffusion Model as Representation Learner [86.09969334071478]
Diffusion Probabilistic Models (DPMs) have recently demonstrated impressive results on various generative tasks.
We propose a novel knowledge transfer method that leverages the knowledge acquired by DPMs for recognition tasks.
arXiv Detail & Related papers (2023-08-21T00:38:39Z) - MA2CL:Masked Attentive Contrastive Learning for Multi-Agent
Reinforcement Learning [128.19212716007794]
We propose an effective framework called textbfMulti-textbfAgent textbfMasked textbfAttentive textbfContrastive textbfLearning (MA2CL)
MA2CL encourages learning representation to be both temporal and agent-level predictive by reconstructing the masked agent observation in latent space.
Our method significantly improves the performance and sample efficiency of different MARL algorithms and outperforms other methods in various vision-based and state-based scenarios.
arXiv Detail & Related papers (2023-06-03T05:32:19Z) - Self-Supervised Reinforcement Learning that Transfers using Random
Features [41.00256493388967]
We propose a self-supervised reinforcement learning method that enables the transfer of behaviors across tasks with different rewards.
Our method is self-supervised in that it can be trained on offline datasets without reward labels, but can then be quickly deployed on new tasks.
arXiv Detail & Related papers (2023-05-26T20:37:06Z) - Leveraging Approximate Symbolic Models for Reinforcement Learning via
Skill Diversity [32.35693772984721]
We introduce Symbolic-Model Guided Reinforcement Learning, wherein we will formalize the relationship between the symbolic model and the underlying MDP.
We will use these models to extract high-level landmarks that will be used to decompose the task.
At the low level, we learn a set of diverse policies for each possible task sub-goal identified by the landmark.
arXiv Detail & Related papers (2022-02-06T23:20:30Z) - Multitask Adaptation by Retrospective Exploration with Learned World
Models [77.34726150561087]
We propose a meta-learned addressing model called RAMa that provides training samples for the MBRL agent taken from task-agnostic storage.
The model is trained to maximize the expected agent's performance by selecting promising trajectories solving prior tasks from the storage.
arXiv Detail & Related papers (2021-10-25T20:02:57Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - Soft Expert Reward Learning for Vision-and-Language Navigation [94.86954695912125]
Vision-and-Language Navigation (VLN) requires an agent to find a specified spot in an unseen environment by following natural language instructions.
We introduce a Soft Expert Reward Learning (SERL) model to overcome the reward engineering designing and generalisation problems of the VLN task.
arXiv Detail & Related papers (2020-07-21T14:17:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.