Inversely Learning Transferable Rewards via Abstracted States
- URL: http://arxiv.org/abs/2501.01669v2
- Date: Mon, 03 Feb 2025 23:08:17 GMT
- Title: Inversely Learning Transferable Rewards via Abstracted States
- Authors: Yikang Gui, Prashant Doshi,
- Abstract summary: Inverse reinforcement learning (IRL) has progressed significantly toward accurately learning the underlying rewards in both discrete and continuous domains from behavior data.
In the context of robotic applications, this helps integrate robots into processing lines involving new tasks without programming from scratch.
We introduce a method to inversely learn an abstract reward function from behavior trajectories in two or more differing instances of a domain.
- Score: 4.5456862813416565
- License:
- Abstract: Inverse reinforcement learning (IRL) has progressed significantly toward accurately learning the underlying rewards in both discrete and continuous domains from behavior data. The next advance is to learn {\em intrinsic} preferences in ways that produce useful behavior in settings or tasks which are different but aligned with the observed ones. In the context of robotic applications, this helps integrate robots into processing lines involving new tasks (with shared intrinsic preferences) without programming from scratch. We introduce a method to inversely learn an abstract reward function from behavior trajectories in two or more differing instances of a domain. The abstract reward function is then used to learn task behavior in another separate instance of the domain. This step offers evidence of its transferability and validates its correctness. We evaluate the method on trajectories in tasks from multiple domains in OpenAI's Gym testbed and AssistiveGym and show that the learned abstract reward functions can successfully learn task behaviors in instances of the respective domains, which have not been seen previously.
Related papers
- On Learning Informative Trajectory Embeddings for Imitation, Classification and Regression [19.01804572722833]
In real-world sequential decision making tasks, learning from observed state-action trajectories is critical for tasks like imitation, classification, and clustering.
We propose a novel method for embedding state-action trajectories into a latent space that captures the skills and competencies in the dynamic underlying decision-making processes.
arXiv Detail & Related papers (2025-01-16T06:52:58Z) - Adaptive Language-Guided Abstraction from Contrastive Explanations [53.48583372522492]
It is necessary to determine which features of the environment are relevant before determining how these features should be used to compute reward.
End-to-end methods for joint feature and reward learning often yield brittle reward functions that are sensitive to spurious state features.
This paper describes a method named ALGAE which alternates between using language models to iteratively identify human-meaningful features.
arXiv Detail & Related papers (2024-09-12T16:51:58Z) - A Pattern Language for Machine Learning Tasks [0.0]
We view objective functions as constraints on the behaviour of learners.
We develop a formal graphical language that allows us to separate the core tasks of a behaviour from its implementation details.
As proof-of-concept, we design a novel task that enables converting classifiers into generative models we call "manipulators"
arXiv Detail & Related papers (2024-07-02T16:50:27Z) - A Generalized Acquisition Function for Preference-based Reward Learning [12.158619866176487]
Preference-based reward learning is a popular technique for teaching robots and autonomous systems how a human user wants them to perform a task.
Previous works have shown that actively synthesizing preference queries to maximize information gain about the reward function parameters improves data efficiency.
We show that it is possible to optimize for learning the reward function up to a behavioral equivalence class, such as inducing the same ranking over behaviors, distribution over choices, or other related definitions of what makes two rewards similar.
arXiv Detail & Related papers (2024-03-09T20:32:17Z) - Learning Reward Functions for Robotic Manipulation by Observing Humans [92.30657414416527]
We use unlabeled videos of humans solving a wide range of manipulation tasks to learn a task-agnostic reward function for robotic manipulation policies.
The learned rewards are based on distances to a goal in an embedding space learned using a time-contrastive objective.
arXiv Detail & Related papers (2022-11-16T16:26:48Z) - Basis for Intentions: Efficient Inverse Reinforcement Learning using
Past Experience [89.30876995059168]
inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior.
This paper addresses the problem of IRL -- inferring the reward function of an agent from observing its behavior.
arXiv Detail & Related papers (2022-08-09T17:29:49Z) - Reward Shaping with Dynamic Trajectory Aggregation [7.6146285961466]
Potential-based reward shaping is a basic method for enriching rewards.
SARSA-RS learns the potential function and acquires it.
We propose a trajectory aggregation that uses subgoal series.
arXiv Detail & Related papers (2021-04-13T13:07:48Z) - Replacing Rewards with Examples: Example-Based Policy Search via
Recursive Classification [133.20816939521941]
In the standard Markov decision process formalism, users specify tasks by writing down a reward function.
In many scenarios, the user is unable to describe the task in words or numbers, but can readily provide examples of what the world would look like if the task were solved.
Motivated by this observation, we derive a control algorithm that aims to visit states that have a high probability of leading to successful outcomes, given only examples of successful outcome states.
arXiv Detail & Related papers (2021-03-23T16:19:55Z) - PsiPhi-Learning: Reinforcement Learning with Demonstrations using
Successor Features and Inverse Temporal Difference Learning [102.36450942613091]
We propose an inverse reinforcement learning algorithm, called emphinverse temporal difference learning (ITD)
We show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called $Psi Phi$-learning.
arXiv Detail & Related papers (2021-02-24T21:12:09Z) - COG: Connecting New Skills to Past Experience with Offline Reinforcement
Learning [78.13740204156858]
We show that we can reuse prior data to extend new skills simply through dynamic programming.
We demonstrate the effectiveness of our approach by chaining together several behaviors seen in prior datasets for solving a new task.
We train our policies in an end-to-end fashion, mapping high-dimensional image observations to low-level robot control commands.
arXiv Detail & Related papers (2020-10-27T17:57:29Z) - Pitfalls of learning a reward function online [28.2272248328398]
We consider a continual (one life'') learning approach where the agent both learns the reward function and optimises for it at the same time.
This comes with a number of pitfalls, such as deliberately manipulating the learning process in one direction.
We show that an uninfluenceable process is automatically unriggable, and if the set of possible environments is sufficiently rich, the converse is true too.
arXiv Detail & Related papers (2020-04-28T16:58:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.