Supervised Reward Inference
- URL: http://arxiv.org/abs/2502.18447v1
- Date: Tue, 25 Feb 2025 18:42:05 GMT
- Title: Supervised Reward Inference
- Authors: Will Schwarzer, Jordan Schneider, Philip S. Thomas, Scott Niekum,
- Abstract summary: We propose that supervised learning offers a unified framework to infer reward functions from any class of behavior.<n>Experiments on simulated robotic manipulation tasks show that our method can efficiently infer rewards from a wide variety of arbitrarily suboptimal demonstrations.
- Score: 27.266579554754948
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing approaches to reward inference from behavior typically assume that humans provide demonstrations according to specific models of behavior. However, humans often indicate their goals through a wide range of behaviors, from actions that are suboptimal due to poor planning or execution to behaviors which are intended to communicate goals rather than achieve them. We propose that supervised learning offers a unified framework to infer reward functions from any class of behavior, and show that such an approach is asymptotically Bayes-optimal under mild assumptions. Experiments on simulated robotic manipulation tasks show that our method can efficiently infer rewards from a wide variety of arbitrarily suboptimal demonstrations.
Related papers
- Adaptive Language-Guided Abstraction from Contrastive Explanations [53.48583372522492]
It is necessary to determine which features of the environment are relevant before determining how these features should be used to compute reward.
End-to-end methods for joint feature and reward learning often yield brittle reward functions that are sensitive to spurious state features.
This paper describes a method named ALGAE which alternates between using language models to iteratively identify human-meaningful features.
arXiv Detail & Related papers (2024-09-12T16:51:58Z) - Inverse-RLignment: Large Language Model Alignment from Demonstrations through Inverse Reinforcement Learning [62.05713042908654]
We introduce Alignment from Demonstrations (AfD), a novel approach leveraging high-quality demonstration data to overcome these challenges.<n>We formalize AfD within a sequential decision-making framework, highlighting its unique challenge of missing reward signals.<n> Practically, we propose a computationally efficient algorithm that extrapolates over a tailored reward model for AfD.
arXiv Detail & Related papers (2024-05-24T15:13:53Z) - Provable Benefits of Policy Learning from Human Preferences in
Contextual Bandit Problems [82.92678837778358]
preference-based methods have demonstrated substantial success in empirical applications such as InstructGPT.
We show how human bias and uncertainty in feedback modelings can affect the theoretical guarantees of these approaches.
arXiv Detail & Related papers (2023-07-24T17:50:24Z) - Basis for Intentions: Efficient Inverse Reinforcement Learning using
Past Experience [89.30876995059168]
inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior.
This paper addresses the problem of IRL -- inferring the reward function of an agent from observing its behavior.
arXiv Detail & Related papers (2022-08-09T17:29:49Z) - Versatile Inverse Reinforcement Learning via Cumulative Rewards [22.56145954060092]
Inverse Reinforcement Learning infers a reward function from expert demonstrations, aiming to encode the behavior and intentions of the expert.
We propose a novel method for Inverse Reinforcement Learning that overcomes these problems by formulating the recovered reward as a sum of iteratively trained discriminators.
arXiv Detail & Related papers (2021-11-15T10:49:15Z) - Generative Adversarial Reward Learning for Generalized Behavior Tendency
Inference [71.11416263370823]
We propose a generative inverse reinforcement learning for user behavioral preference modelling.
Our model can automatically learn the rewards from user's actions based on discriminative actor-critic network and Wasserstein GAN.
arXiv Detail & Related papers (2021-05-03T13:14:25Z) - Replacing Rewards with Examples: Example-Based Policy Search via
Recursive Classification [133.20816939521941]
In the standard Markov decision process formalism, users specify tasks by writing down a reward function.
In many scenarios, the user is unable to describe the task in words or numbers, but can readily provide examples of what the world would look like if the task were solved.
Motivated by this observation, we derive a control algorithm that aims to visit states that have a high probability of leading to successful outcomes, given only examples of successful outcome states.
arXiv Detail & Related papers (2021-03-23T16:19:55Z) - Inverse Reinforcement Learning via Matching of Optimality Profiles [2.561053769852449]
We propose an algorithm that learns a reward function from demonstrations of suboptimal or heterogeneous performance.
We show that our method is capable of learning reward functions such that policies trained to optimize them outperform the demonstrations used for fitting the reward functions.
arXiv Detail & Related papers (2020-11-18T13:23:43Z) - Reinforcement Learning with Supervision from Noisy Demonstrations [38.00968774243178]
We propose a novel framework to adaptively learn the policy by jointly interacting with the environment and exploiting the expert demonstrations.
Experimental results in various environments with multiple popular reinforcement learning algorithms show that the proposed approach can learn robustly with noisy demonstrations.
arXiv Detail & Related papers (2020-06-14T06:03:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.