STARC: A General Framework For Quantifying Differences Between Reward
Functions
- URL: http://arxiv.org/abs/2309.15257v2
- Date: Mon, 11 Mar 2024 16:29:17 GMT
- Title: STARC: A General Framework For Quantifying Differences Between Reward
Functions
- Authors: Joar Skalse, Lucy Farnik, Sumeet Ramesh Motwani, Erik Jenner, Adam
Gleave, Alessandro Abate
- Abstract summary: We provide a class of pseudometrics on the space of all reward functions that we call STARC metrics.
We show that STARC metrics induce both an upper and a lower bound on worst-case regret.
We also identify a number of issues with reward metrics proposed by earlier works.
- Score: 55.33869271912095
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In order to solve a task using reinforcement learning, it is necessary to
first formalise the goal of that task as a reward function. However, for many
real-world tasks, it is very difficult to manually specify a reward function
that never incentivises undesirable behaviour. As a result, it is increasingly
popular to use \emph{reward learning algorithms}, which attempt to \emph{learn}
a reward function from data. However, the theoretical foundations of reward
learning are not yet well-developed. In particular, it is typically not known
when a given reward learning algorithm with high probability will learn a
reward function that is safe to optimise. This means that reward learning
algorithms generally must be evaluated empirically, which is expensive, and
that their failure modes are difficult to anticipate in advance. One of the
roadblocks to deriving better theoretical guarantees is the lack of good
methods for quantifying the difference between reward functions. In this paper
we provide a solution to this problem, in the form of a class of pseudometrics
on the space of all reward functions that we call STARC (STAndardised Reward
Comparison) metrics. We show that STARC metrics induce both an upper and a
lower bound on worst-case regret, which implies that our metrics are tight, and
that any metric with the same properties must be bilipschitz equivalent to
ours. Moreover, we also identify a number of issues with reward metrics
proposed by earlier works. Finally, we evaluate our metrics empirically, to
demonstrate their practical efficacy. STARC metrics can be used to make both
theoretical and empirical analysis of reward learning algorithms both easier
and more principled.
Related papers
- Walking the Values in Bayesian Inverse Reinforcement Learning [66.68997022043075]
Key challenge in Bayesian IRL is bridging the computational gap between the hypothesis space of possible rewards and the likelihood.
We propose ValueWalk - a new Markov chain Monte Carlo method based on this insight.
arXiv Detail & Related papers (2024-07-15T17:59:52Z) - Uncertainty-Aware Reward-Free Exploration with General Function Approximation [69.27868448449755]
In this paper, we propose a reward-free reinforcement learning algorithm called alg.
The key idea behind our algorithm is an uncertainty-aware intrinsic reward for exploring the environment.
Experiment results show that GFA-RFE outperforms or is comparable to the performance of state-of-the-art unsupervised RL algorithms.
arXiv Detail & Related papers (2024-06-24T01:37:18Z) - Provably Feedback-Efficient Reinforcement Learning via Active Reward
Learning [26.067411894141863]
An appropriate reward function is of paramount importance in specifying a task in reinforcement learning (RL)
Human-in-the-loop (HiL) RL allows humans to communicate complex goals to the RL agent by providing various types of feedback.
We provide an active-learning-based RL algorithm that first explores the environment without specifying a reward function.
arXiv Detail & Related papers (2023-04-18T12:36:09Z) - Preprocessing Reward Functions for Interpretability [2.538209532048867]
We propose exploiting the intrinsic structure of reward functions by first preprocessing them into simpler but equivalent reward functions.
Our empirical evaluation shows that preprocessed rewards are often significantly easier to understand than the original reward.
arXiv Detail & Related papers (2022-03-25T10:19:35Z) - Dynamics-Aware Comparison of Learned Reward Functions [21.159457412742356]
The ability to learn reward functions plays an important role in enabling the deployment of intelligent agents in the real world.
Reward functions are typically compared by considering the behavior of optimized policies, but this approach conflates deficiencies in the reward function with those of the policy search algorithm used to optimize it.
We propose the Dynamics-Aware Reward Distance (DARD), a new reward pseudometric.
arXiv Detail & Related papers (2022-01-25T03:48:00Z) - MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven
Reinforcement Learning [65.52675802289775]
We show that an uncertainty aware classifier can solve challenging reinforcement learning problems.
We propose a novel method for computing the normalized maximum likelihood (NML) distribution.
We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions.
arXiv Detail & Related papers (2021-07-15T08:19:57Z) - Replacing Rewards with Examples: Example-Based Policy Search via
Recursive Classification [133.20816939521941]
In the standard Markov decision process formalism, users specify tasks by writing down a reward function.
In many scenarios, the user is unable to describe the task in words or numbers, but can readily provide examples of what the world would look like if the task were solved.
Motivated by this observation, we derive a control algorithm that aims to visit states that have a high probability of leading to successful outcomes, given only examples of successful outcome states.
arXiv Detail & Related papers (2021-03-23T16:19:55Z) - Understanding Learned Reward Functions [6.714172005695389]
We investigate techniques for interpreting learned reward functions.
In particular, we apply saliency methods to identify failure modes and predict the robustness of reward functions.
We find that learned reward functions often implement surprising algorithms that rely on contingent aspects of the environment.
arXiv Detail & Related papers (2020-12-10T18:19:48Z) - Provably Efficient Reward-Agnostic Navigation with Linear Value
Iteration [143.43658264904863]
We show how iteration under a more standard notion of low inherent Bellman error, typically employed in least-square value-style algorithms, can provide strong PAC guarantees on learning a near optimal value function.
We present a computationally tractable algorithm for the reward-free setting and show how it can be used to learn a near optimal policy for any (linear) reward function.
arXiv Detail & Related papers (2020-08-18T04:34:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.