Agents Need Not Know Their Purpose
- URL: http://arxiv.org/abs/2402.09734v1
- Date: Thu, 15 Feb 2024 06:15:46 GMT
- Title: Agents Need Not Know Their Purpose
- Authors: Paulo Garcia
- Abstract summary: This paper describes oblivious agents: agents architected in such a way that their effective utility function is an aggregation of hidden sub-functions.
We show that an oblivious agent, behaving rationally, constructs an internal approximation of designers' intentions.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Ensuring artificial intelligence behaves in such a way that is aligned with
human values is commonly referred to as the alignment challenge. Prior work has
shown that rational agents, behaving in such a way that maximizes a utility
function, will inevitably behave in such a way that is not aligned with human
values, especially as their level of intelligence goes up. Prior work has also
shown that there is no "one true utility function"; solutions must include a
more holistic approach to alignment. This paper describes oblivious agents:
agents that are architected in such a way that their effective utility function
is an aggregation of a known and hidden sub-functions. The hidden component, to
be maximized, is internally implemented as a black box, preventing the agent
from examining it. The known component, to be minimized, is knowledge of the
hidden sub-function. Architectural constraints further influence how agent
actions can evolve its internal environment model. We show that an oblivious
agent, behaving rationally, constructs an internal approximation of designers'
intentions (i.e., infers alignment), and, as a consequence of its architecture
and effective utility function, behaves in such a way that maximizes alignment;
i.e., maximizing the approximated intention function. We show that,
paradoxically, it does this for whatever utility function is used as the hidden
component and, in contrast with extant techniques, chances of alignment
actually improve as agent intelligence grows.
Related papers
- Intention-aware policy graphs: answering what, how, and why in opaque agents [0.1398098625978622]
Agents are a special kind of AI-based software in that they interact in complex environments and have increased potential for emergent behaviour.
We propose a Probabilistic Graphical Model along with a pipeline for designing such model.
We contribute measurements that evaluate the interpretability and reliability of explanations provided.
This model can be constructed by taking partial observations of the agent's actions and world states.
arXiv Detail & Related papers (2024-09-27T09:31:45Z) - Tell Me More! Towards Implicit User Intention Understanding of Language
Model Driven Agents [110.25679611755962]
Current language model-driven agents often lack mechanisms for effective user participation, which is crucial given the vagueness commonly found in user instructions.
We introduce Intention-in-Interaction (IN3), a novel benchmark designed to inspect users' implicit intentions through explicit queries.
We empirically train Mistral-Interact, a powerful model that proactively assesses task vagueness, inquires user intentions, and refines them into actionable goals.
arXiv Detail & Related papers (2024-02-14T14:36:30Z) - Decision-Making Among Bounded Rational Agents [5.24482648010213]
We introduce the concept of bounded rationality from an information-theoretic view into the game-theoretic framework.
This allows the robots to reason other agents' sub-optimal behaviors and act accordingly under their computational constraints.
We demonstrate that the resulting framework allows the robots to reason about different levels of rational behaviors of other agents and compute a reasonable strategy under its computational constraint.
arXiv Detail & Related papers (2022-10-17T00:29:24Z) - On Avoiding Power-Seeking by Artificial Intelligence [93.9264437334683]
We do not know how to align a very intelligent AI agent's behavior with human interests.
I investigate whether we can build smart AI agents which have limited impact on the world, and which do not autonomously seek power.
arXiv Detail & Related papers (2022-06-23T16:56:21Z) - Explaining Reinforcement Learning Policies through Counterfactual
Trajectories [147.7246109100945]
A human developer must validate that an RL agent will perform well at test-time.
Our method conveys how the agent performs under distribution shifts by showing the agent's behavior across a wider trajectory distribution.
In a user study, we demonstrate that our method enables users to score better than baseline methods on one of two agent validation tasks.
arXiv Detail & Related papers (2022-01-29T00:52:37Z) - What is Going on Inside Recurrent Meta Reinforcement Learning Agents? [63.58053355357644]
Recurrent meta reinforcement learning (meta-RL) agents are agents that employ a recurrent neural network (RNN) for the purpose of "learning a learning algorithm"
We shed light on the internal working mechanisms of these agents by reformulating the meta-RL problem using the Partially Observable Markov Decision Process (POMDP) framework.
arXiv Detail & Related papers (2021-04-29T20:34:39Z) - Understanding the origin of information-seeking exploration in
probabilistic objectives for control [62.997667081978825]
An exploration-exploitation trade-off is central to the description of adaptive behaviour.
One approach to solving this trade-off has been to equip or propose that agents possess an intrinsic 'exploratory drive'
We show that this combination of utility maximizing and information-seeking behaviour arises from the minimization of an entirely difference class of objectives.
arXiv Detail & Related papers (2021-03-11T18:42:39Z) - Performance of Bounded-Rational Agents With the Ability to Self-Modify [1.933681537640272]
Self-modification of agents embedded in complex environments is hard to avoid.
It has been argued that intelligent agents have an incentive to avoid modifying their utility function so that their future instances work towards the same goals.
We show that this result is no longer true for agents with bounded rationality.
arXiv Detail & Related papers (2020-11-12T09:25:08Z) - TripleTree: A Versatile Interpretable Representation of Black Box Agents
and their Environments [9.822870889029113]
We propose a versatile first step towards general understanding is to discretise the state space into convex regions.
We create such a representation using a novel variant of the CART decision tree algorithm.
We demonstrate how it facilitates practical understanding of black box agents through prediction, visualisation and rule-based explanation.
arXiv Detail & Related papers (2020-09-10T09:22:27Z) - Attention or memory? Neurointerpretable agents in space and time [0.0]
We design a model incorporating a self-attention mechanism that implements task-state representations in semantic feature-space.
To evaluate the agent's selective properties, we add a large volume of task-irrelevant features to observations.
In line with neuroscience predictions, self-attention leads to increased robustness to noise compared to benchmark models.
arXiv Detail & Related papers (2020-07-09T15:04:26Z) - Maximizing Information Gain in Partially Observable Environments via
Prediction Reward [64.24528565312463]
This paper tackles the challenge of using belief-based rewards for a deep RL agent.
We derive the exact error between negative entropy and the expected prediction reward.
This insight provides theoretical motivation for several fields using prediction rewards.
arXiv Detail & Related papers (2020-05-11T08:13:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.