AGI Agent Safety by Iteratively Improving the Utility Function
- URL: http://arxiv.org/abs/2007.05411v1
- Date: Fri, 10 Jul 2020 14:30:56 GMT
- Title: AGI Agent Safety by Iteratively Improving the Utility Function
- Authors: Koen Holtman
- Abstract summary: We present an AGI safety layer that creates a special dedicated input terminal to support the iterative improvement of an AGI agent's utility function.
We show ongoing work in mapping it to a Causal Influence Diagram (CID)
We then present the design of a learning agent, a design that wraps the safety layer around either a known machine learning system, or a potential future AGI-level learning system.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While it is still unclear if agents with Artificial General Intelligence
(AGI) could ever be built, we can already use mathematical models to
investigate potential safety systems for these agents. We present an AGI safety
layer that creates a special dedicated input terminal to support the iterative
improvement of an AGI agent's utility function. The humans who switched on the
agent can use this terminal to close any loopholes that are discovered in the
utility function's encoding of agent goals and constraints, to direct the agent
towards new goals, or to force the agent to switch itself off. An AGI agent may
develop the emergent incentive to manipulate the above utility function
improvement process, for example by deceiving, restraining, or even attacking
the humans involved. The safety layer will partially, and sometimes fully,
suppress this dangerous incentive. The first part of this paper generalizes
earlier work on AGI emergency stop buttons. We aim to make the mathematical
methods used to construct the layer more accessible, by applying them to an MDP
model. We discuss two provable properties of the safety layer, and show ongoing
work in mapping it to a Causal Influence Diagram (CID). In the second part, we
develop full mathematical proofs, and show that the safety layer creates a type
of bureaucratic blindness. We then present the design of a learning agent, a
design that wraps the safety layer around either a known machine learning
system, or a potential future AGI-level learning system. The resulting agent
will satisfy the provable safety properties from the moment it is first
switched on. Finally, we show how this agent can be mapped from its model to a
real-life implementation. We review the methodological issues involved in this
step, and discuss how these are typically resolved.
Related papers
- AgentOps: Enabling Observability of LLM Agents [12.49728300301026]
Large language model (LLM) agents raise significant concerns on AI safety due to their autonomous and non-deterministic behavior.
We present a comprehensive taxonomy of AgentOps, identifying the artifacts and associated data that should be traced throughout the entire lifecycle of agents to achieve effective observability.
Our taxonomy serves as a reference template for developers to design and implement AgentOps infrastructure that supports monitoring, logging, and analytics.
arXiv Detail & Related papers (2024-11-08T02:31:03Z) - Agent-as-a-Judge: Evaluate Agents with Agents [61.33974108405561]
We introduce the Agent-as-a-Judge framework, wherein agentic systems are used to evaluate agentic systems.
This is an organic extension of the LLM-as-a-Judge framework, incorporating agentic features that enable intermediate feedback for the entire task-solving process.
We present DevAI, a new benchmark of 55 realistic automated AI development tasks.
arXiv Detail & Related papers (2024-10-14T17:57:02Z) - Gödel Agent: A Self-Referential Agent Framework for Recursive Self-Improvement [117.94654815220404]
G"odel Agent is a self-evolving framework inspired by the G"odel machine.
G"odel Agent can achieve continuous self-improvement, surpassing manually crafted agents in performance, efficiency, and generalizability.
arXiv Detail & Related papers (2024-10-06T10:49:40Z) - Automated Design of Agentic Systems [5.404186221463082]
We formulate a new research area, Automated Design of Agentic Systems, which aims to automatically create powerful agentic system designs.
We show that our algorithm can progressively invent agents with novel designs that greatly outperform state-of-the-art hand-designed agents.
arXiv Detail & Related papers (2024-08-15T21:59:23Z) - Dissecting Adversarial Robustness of Multimodal LM Agents [70.2077308846307]
We manually create 200 targeted adversarial tasks and evaluation scripts in a realistic threat model on top of VisualWebArena.
We find that we can successfully break latest agents that use black-box frontier LMs, including those that perform reflection and tree search.
We also use ARE to rigorously evaluate how the robustness changes as new components are added.
arXiv Detail & Related papers (2024-06-18T17:32:48Z) - AgentGym: Evolving Large Language Model-based Agents across Diverse Environments [116.97648507802926]
Large language models (LLMs) are considered a promising foundation to build such agents.
We take the first step towards building generally-capable LLM-based agents with self-evolution ability.
We propose AgentGym, a new framework featuring a variety of environments and tasks for broad, real-time, uni-format, and concurrent agent exploration.
arXiv Detail & Related papers (2024-06-06T15:15:41Z) - Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning [50.47568731994238]
Key method for creating Artificial Intelligence (AI) agents is Reinforcement Learning (RL)
This paper presents a general framework model for integrating and learning structured reasoning into AI agents' policies.
arXiv Detail & Related papers (2023-12-22T17:57:57Z) - The Rise and Potential of Large Language Model Based Agents: A Survey [91.71061158000953]
Large language models (LLMs) are regarded as potential sparks for Artificial General Intelligence (AGI)
We start by tracing the concept of agents from its philosophical origins to its development in AI, and explain why LLMs are suitable foundations for agents.
We explore the extensive applications of LLM-based agents in three aspects: single-agent scenarios, multi-agent scenarios, and human-agent cooperation.
arXiv Detail & Related papers (2023-09-14T17:12:03Z) - Counterfactual Planning in AGI Systems [0.0]
Key step in counterfactual planning is to use an AGI machine learning system to construct a counterfactual world model.
A counterfactual planning agent determines the action that best maximizes expected utility in this counterfactual planning world.
We use counterfactual planning to construct an AGI agent emergency stop button, and a safety interlock that will automatically stop the agent before it undergoes an intelligence explosion.
arXiv Detail & Related papers (2021-01-29T13:44:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.