Performance of Bounded-Rational Agents With the Ability to Self-Modify
- URL: http://arxiv.org/abs/2011.06275v2
- Date: Mon, 18 Jan 2021 09:55:26 GMT
- Title: Performance of Bounded-Rational Agents With the Ability to Self-Modify
- Authors: Jakub T\v{e}tek, Marek Sklenka, Tom\'a\v{s} Gaven\v{c}iak
- Abstract summary: Self-modification of agents embedded in complex environments is hard to avoid.
It has been argued that intelligent agents have an incentive to avoid modifying their utility function so that their future instances work towards the same goals.
We show that this result is no longer true for agents with bounded rationality.
- Score: 1.933681537640272
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-modification of agents embedded in complex environments is hard to
avoid, whether it happens via direct means (e.g. own code modification) or
indirectly (e.g. influencing the operator, exploiting bugs or the environment).
It has been argued that intelligent agents have an incentive to avoid modifying
their utility function so that their future instances work towards the same
goals.
Everitt et al. (2016) formally show that providing an option to self-modify
is harmless for perfectly rational agents. We show that this result is no
longer true for agents with bounded rationality. In such agents,
self-modification may cause exponential deterioration in performance and
gradual misalignment of a previously aligned agent. We investigate how the size
of this effect depends on the type and magnitude of imperfections in the
agent's rationality (1-4 below). We also discuss model assumptions and the
wider problem and framing space.
We examine four ways in which an agent can be bounded-rational: it either (1)
doesn't always choose the optimal action, (2) is not perfectly aligned with
human values, (3) has an inaccurate model of the environment, or (4) uses the
wrong temporal discounting factor. We show that while in the cases (2)-(4) the
misalignment caused by the agent's imperfection does not increase over time,
with (1) the misalignment may grow exponentially.
Related papers
- Agents Need Not Know Their Purpose [0.0]
This paper describes oblivious agents: agents architected in such a way that their effective utility function is an aggregation of hidden sub-functions.
We show that an oblivious agent, behaving rationally, constructs an internal approximation of designers' intentions.
arXiv Detail & Related papers (2024-02-15T06:15:46Z) - On the Convergence of Bounded Agents [80.67035535522777]
A bounded agent has converged when the minimal number of states needed to describe the agent's future behavior cannot decrease.
The second view says that a bounded agent has converged just when the agent's performance only changes if the agent's internal state changes.
arXiv Detail & Related papers (2023-07-20T17:27:29Z) - Decision-Making Among Bounded Rational Agents [5.24482648010213]
We introduce the concept of bounded rationality from an information-theoretic view into the game-theoretic framework.
This allows the robots to reason other agents' sub-optimal behaviors and act accordingly under their computational constraints.
We demonstrate that the resulting framework allows the robots to reason about different levels of rational behaviors of other agents and compute a reasonable strategy under its computational constraint.
arXiv Detail & Related papers (2022-10-17T00:29:24Z) - On Avoiding Power-Seeking by Artificial Intelligence [93.9264437334683]
We do not know how to align a very intelligent AI agent's behavior with human interests.
I investigate whether we can build smart AI agents which have limited impact on the world, and which do not autonomously seek power.
arXiv Detail & Related papers (2022-06-23T16:56:21Z) - Formalizing the Problem of Side Effect Regularization [81.97441214404247]
We propose a formal criterion for side effect regularization via the assistance game framework.
In these games, the agent solves a partially observable Markov decision process.
We show that this POMDP is solved by trading off the proxy reward with the agent's ability to achieve a range of future tasks.
arXiv Detail & Related papers (2022-06-23T16:36:13Z) - Heterogeneous-Agent Trajectory Forecasting Incorporating Class
Uncertainty [54.88405167739227]
We present HAICU, a method for heterogeneous-agent trajectory forecasting that explicitly incorporates agents' class probabilities.
We additionally present PUP, a new challenging real-world autonomous driving dataset.
We demonstrate that incorporating class probabilities in trajectory forecasting significantly improves performance in the face of uncertainty.
arXiv Detail & Related papers (2021-04-26T10:28:34Z) - Empirically Verifying Hypotheses Using Reinforcement Learning [58.09414653169534]
This paper formulates hypothesis verification as an RL problem.
We aim to build an agent that, given a hypothesis about the dynamics of the world, can take actions to generate observations which can help predict whether the hypothesis is true or false.
arXiv Detail & Related papers (2020-06-29T01:01:10Z) - Pessimism About Unknown Unknowns Inspires Conservatism [24.085795452335145]
We define an idealized Bayesian reinforcement learner which follows a policy that maximizes the worst-case expected reward over a set of world-models.
A scalar parameter tunes the agent's pessimism by changing the size of the set of world-models taken into account.
Since pessimism discourages exploration, at each timestep, the agent may defer to a mentor, who may be a human or some known-safe policy.
arXiv Detail & Related papers (2020-06-15T20:46:33Z) - Distributing entanglement with separable states: assessment of encoding
and decoding imperfections [55.41644538483948]
Entanglement can be distributed using a carrier which is always separable from the rest of the systems involved.
We consider the effect of incoherent dynamics acting alongside imperfect unitary interactions.
We show that entanglement gain is possible even with substantial unitary errors.
arXiv Detail & Related papers (2020-02-11T15:25:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.