Replacing Rewards with Examples: Example-Based Policy Search via
Recursive Classification
- URL: http://arxiv.org/abs/2103.12656v1
- Date: Tue, 23 Mar 2021 16:19:55 GMT
- Title: Replacing Rewards with Examples: Example-Based Policy Search via
Recursive Classification
- Authors: Benjamin Eysenbach, Sergey Levine, and Ruslan Salakhutdinov
- Abstract summary: In the standard Markov decision process formalism, users specify tasks by writing down a reward function.
In many scenarios, the user is unable to describe the task in words or numbers, but can readily provide examples of what the world would look like if the task were solved.
Motivated by this observation, we derive a control algorithm that aims to visit states that have a high probability of leading to successful outcomes, given only examples of successful outcome states.
- Score: 133.20816939521941
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the standard Markov decision process formalism, users specify tasks by
writing down a reward function. However, in many scenarios, the user is unable
to describe the task in words or numbers, but can readily provide examples of
what the world would look like if the task were solved. Motivated by this
observation, we derive a control algorithm from first principles that aims to
visit states that have a high probability of leading to successful outcomes,
given only examples of successful outcome states. Prior work has approached
similar problem settings in a two-stage process, first learning an auxiliary
reward function and then optimizing this reward function using another
reinforcement learning algorithm. In contrast, we derive a method based on
recursive classification that eschews auxiliary reward functions and instead
directly learns a value function from transitions and successful outcomes. Our
method therefore requires fewer hyperparameters to tune and lines of code to
debug. We show that our method satisfies a new data-driven Bellman equation,
where examples take the place of the typical reward function term. Experiments
show that our approach outperforms prior methods that learn explicit reward
functions.
Related papers
- Walking the Values in Bayesian Inverse Reinforcement Learning [66.68997022043075]
Key challenge in Bayesian IRL is bridging the computational gap between the hypothesis space of possible rewards and the likelihood.
We propose ValueWalk - a new Markov chain Monte Carlo method based on this insight.
arXiv Detail & Related papers (2024-07-15T17:59:52Z) - A Generalized Acquisition Function for Preference-based Reward Learning [12.158619866176487]
Preference-based reward learning is a popular technique for teaching robots and autonomous systems how a human user wants them to perform a task.
Previous works have shown that actively synthesizing preference queries to maximize information gain about the reward function parameters improves data efficiency.
We show that it is possible to optimize for learning the reward function up to a behavioral equivalence class, such as inducing the same ranking over behaviors, distribution over choices, or other related definitions of what makes two rewards similar.
arXiv Detail & Related papers (2024-03-09T20:32:17Z) - STARC: A General Framework For Quantifying Differences Between Reward
Functions [55.33869271912095]
We provide a class of pseudometrics on the space of all reward functions that we call STARC metrics.
We show that STARC metrics induce both an upper and a lower bound on worst-case regret.
We also identify a number of issues with reward metrics proposed by earlier works.
arXiv Detail & Related papers (2023-09-26T20:31:19Z) - Basis for Intentions: Efficient Inverse Reinforcement Learning using
Past Experience [89.30876995059168]
inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior.
This paper addresses the problem of IRL -- inferring the reward function of an agent from observing its behavior.
arXiv Detail & Related papers (2022-08-09T17:29:49Z) - Preprocessing Reward Functions for Interpretability [2.538209532048867]
We propose exploiting the intrinsic structure of reward functions by first preprocessing them into simpler but equivalent reward functions.
Our empirical evaluation shows that preprocessed rewards are often significantly easier to understand than the original reward.
arXiv Detail & Related papers (2022-03-25T10:19:35Z) - Invariance in Policy Optimisation and Partial Identifiability in Reward
Learning [67.4640841144101]
We characterise the partial identifiability of the reward function given popular reward learning data sources.
We also analyse the impact of this partial identifiability for several downstream tasks, such as policy optimisation.
arXiv Detail & Related papers (2022-03-14T20:19:15Z) - Potential-based Reward Shaping in Sokoban [5.563631490799427]
We study whether we can use a search algorithm(A*) to automatically generate a potential function for reward shaping in Sokoban.
Results showed that learning with shaped reward function is faster than learning from scratch.
Results indicate that distance functions could be a suitable function for Sokoban.
arXiv Detail & Related papers (2021-09-10T06:28:09Z) - MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven
Reinforcement Learning [65.52675802289775]
We show that an uncertainty aware classifier can solve challenging reinforcement learning problems.
We propose a novel method for computing the normalized maximum likelihood (NML) distribution.
We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions.
arXiv Detail & Related papers (2021-07-15T08:19:57Z) - Reward Shaping with Dynamic Trajectory Aggregation [7.6146285961466]
Potential-based reward shaping is a basic method for enriching rewards.
SARSA-RS learns the potential function and acquires it.
We propose a trajectory aggregation that uses subgoal series.
arXiv Detail & Related papers (2021-04-13T13:07:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.