Partial Identifiability in Inverse Reinforcement Learning For Agents With Non-Exponential Discounting
- URL: http://arxiv.org/abs/2412.11155v1
- Date: Sun, 15 Dec 2024 11:08:58 GMT
- Title: Partial Identifiability in Inverse Reinforcement Learning For Agents With Non-Exponential Discounting
- Authors: Joar Skalse, Alessandro Abate,
- Abstract summary: inverse reinforcement learning aims to infer an agent's preferences from observing their behaviour.<n>One of the central difficulties in IRL is that multiple preferences may lead to the same observed behaviour.<n>We show that generally IRL is unable to infer enough information about $R$ to identify the correct optimal policy.
- Score: 64.13583792391783
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The aim of inverse reinforcement learning (IRL) is to infer an agent's preferences from observing their behaviour. Usually, preferences are modelled as a reward function, $R$, and behaviour is modelled as a policy, $\pi$. One of the central difficulties in IRL is that multiple preferences may lead to the same observed behaviour. That is, $R$ is typically underdetermined by $\pi$, which means that $R$ is only partially identifiable. Recent work has characterised the extent of this partial identifiability for different types of agents, including optimal and Boltzmann-rational agents. However, work so far has only considered agents that discount future reward exponentially: this is a serious limitation, especially given that extensive work in the behavioural sciences suggests that humans are better modelled as discounting hyperbolically. In this work, we newly characterise partial identifiability in IRL for agents with non-exponential discounting: our results are in particular relevant for hyperbolical discounting, but they also more generally apply to agents that use other types of (non-exponential) discounting. We significantly show that generally IRL is unable to infer enough information about $R$ to identify the correct optimal policy, which entails that IRL alone can be insufficient to adequately characterise the preferences of such agents.
Related papers
- Partial Identifiability and Misspecification in Inverse Reinforcement Learning [64.13583792391783]
The aim of Inverse Reinforcement Learning is to infer a reward function $R$ from a policy $pi$.
This paper provides a comprehensive analysis of partial identifiability and misspecification in IRL.
arXiv Detail & Related papers (2024-11-24T18:35:46Z) - Robust Preference Optimization through Reward Model Distillation [68.65844394615702]
Direct Preference Optimization (DPO) is a popular offline alignment method that trains a policy directly on preference data.
We analyze this phenomenon and use distillation to get a better proxy for the true preference distribution over generation pairs.
Our results show that distilling from such a family of reward models leads to improved robustness to distribution shift in preference annotations.
arXiv Detail & Related papers (2024-05-29T17:39:48Z) - Quantifying the Sensitivity of Inverse Reinforcement Learning to
Misspecification [72.08225446179783]
Inverse reinforcement learning aims to infer an agent's preferences from their behaviour.
To do this, we need a behavioural model of how $pi$ relates to $R$.
We analyse how sensitive the IRL problem is to misspecification of the behavioural model.
arXiv Detail & Related papers (2024-03-11T16:09:39Z) - Misspecification in Inverse Reinforcement Learning [80.91536434292328]
The aim of Inverse Reinforcement Learning (IRL) is to infer a reward function $R$ from a policy $pi$.
One of the primary motivations behind IRL is to infer human preferences from human behaviour.
This means that they are misspecified, which raises the worry that they might lead to unsound inferences if applied to real-world data.
arXiv Detail & Related papers (2022-12-06T18:21:47Z) - Asymptotic Statistical Analysis of $f$-divergence GAN [13.587087960403199]
Generative Adversarial Networks (GANs) have achieved great success in data generation.
We consider the statistical behavior of the general $f$-divergence formulation of GAN.
The resulting estimation method is referred to as Adversarial Gradient Estimation (AGE)
arXiv Detail & Related papers (2022-09-14T18:08:37Z) - Maximizing Information Gain in Partially Observable Environments via
Prediction Reward [64.24528565312463]
This paper tackles the challenge of using belief-based rewards for a deep RL agent.
We derive the exact error between negative entropy and the expected prediction reward.
This insight provides theoretical motivation for several fields using prediction rewards.
arXiv Detail & Related papers (2020-05-11T08:13:49Z) - Bounded Incentives in Manipulating the Probabilistic Serial Rule [8.309903898123526]
Probabilistic Serial is not incentive-compatible.
A substantial utility gain through strategic behaviors would trigger self-interested agents to manipulate the mechanism.
We show that the incentive ratio of the mechanism is $frac32$.
arXiv Detail & Related papers (2020-01-28T23:53:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.