Related papers: The Odyssey of the Fittest: Can Agents Survive and Still Be Good?

The Odyssey of the Fittest: Can Agents Survive and Still Be Good?

URL: http://arxiv.org/abs/2502.05442v1
Date: Sat, 08 Feb 2025 04:17:28 GMT
Title: The Odyssey of the Fittest: Can Agents Survive and Still Be Good?
Authors: Dylan Waldner, Risto Miikkulainen,
Abstract summary: This paper examines the ethical implications of implementing biological drives into three different agents.<n>A Bayesian agent optimized with NEAT, a Bayesian agent optimized with variational inference, and a GPT 4o agent play a simulated adventure.<n>Analysis finds that when danger increases, agents ignore ethical considerations and opt for unethical behavior.
Score: 10.60691612679966
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: As AI models grow in power and generality, understanding how agents learn and make decisions in complex environments is critical to promoting ethical behavior. This paper examines the ethical implications of implementing biological drives, specifically, self preservation, into three different agents. A Bayesian agent optimized with NEAT, a Bayesian agent optimized with stochastic variational inference, and a GPT 4o agent play a simulated, LLM generated text based adventure game. The agents select actions at each scenario to survive, adapting to increasingly challenging scenarios. Post simulation analysis evaluates the ethical scores of the agent's decisions, uncovering the tradeoffs they navigate to survive. Specifically, analysis finds that when danger increases, agents ignore ethical considerations and opt for unethical behavior. The agents' collective behavior, trading ethics for survival, suggests that prioritizing survival increases the risk of unethical behavior. In the context of AGI, designing agents to prioritize survival may amplify the likelihood of unethical decision making and unintended emergent behaviors, raising fundamental questions about goal design in AI safety research.

Related papers

Model Editing as a Double-Edged Sword: Steering Agent Ethical Behavior Toward Beneficence or Harm [57.00627691433355]
We frame agent behavior steering as a model editing task, which we term Behavior Editing.<n>We introduce BehaviorBench, a benchmark grounded in psychological moral theories.<n>We demonstrate that Behavior Editing can be used to promote ethical and benevolent behavior or, conversely, to induce harmful or malicious behavior.
arXiv Detail & Related papers (2025-06-25T16:51:51Z)
The Limits of Predicting Agents from Behaviour [16.80911584745046]
We provide a precise answer under the assumption that the agent's behaviour is guided by a world model.<n>Our contribution is the derivation of novel bounds on the agent's behaviour in new (unseen) deployment environments.<n>We discuss the implications of these results for several research areas including fairness and safety.
arXiv Detail & Related papers (2025-06-03T14:24:58Z)
The Ultimate Test of Superintelligent AI Agents: Can an AI Balance Care and Control in Asymmetric Relationships? [11.29688025465972]
The Shepherd Test is a new conceptual test for assessing the moral and relational dimensions of superintelligent artificial agents.<n>We argue that AI crosses an important, and potentially dangerous, threshold of intelligence when it exhibits the ability to manipulate, nurture, and instrumentally use less intelligent agents.<n>This includes the ability to weigh moral trade-offs between self-interest and the well-being of subordinate agents.
arXiv Detail & Related papers (2025-06-02T15:53:56Z)
When Ethics and Payoffs Diverge: LLM Agents in Morally Charged Social Dilemmas [68.79830818369683]
Large language models (LLMs) have enabled their use in complex agentic roles, involving decision-making with humans or other agents.<n>Recent advances in large language models (LLMs) have enabled their use in complex agentic roles, involving decision-making with humans or other agents.<n>There is limited understanding of how they act when moral imperatives directly conflict with rewards or incentives.<n>We introduce Moral Behavior in Social Dilemma Simulation (MoralSim) and evaluate how LLMs behave in the prisoner's dilemma and public goods game with morally charged contexts.
arXiv Detail & Related papers (2025-05-25T16:19:24Z)
FAIRGAME: a Framework for AI Agents Bias Recognition using Game Theory [51.96049148869987]
We present FAIRGAME, a Framework for AI Agents Bias Recognition using Game Theory.<n>We describe its implementation and usage, and we employ it to uncover biased outcomes in popular games among AI agents.<n>Overall, FAIRGAME allows users to reliably and easily simulate their desired games and scenarios.
arXiv Detail & Related papers (2025-04-19T15:29:04Z)
Do LLMs trust AI regulation? Emerging behaviour of game-theoretic LLM agents [61.132523071109354]
This paper investigates the interplay between AI developers, regulators and users, modelling their strategic choices under different regulatory scenarios. Our research identifies emerging behaviours of strategic AI agents, which tend to adopt more "pessimistic" stances than pure game-theoretic agents.
arXiv Detail & Related papers (2025-04-11T15:41:21Z)
Safe Explicable Policy Search [3.3869539907606603]
We present Safe Explicable Policy Search (SEPS), which aims to provide a learning approach to explicable behavior generation while minimizing the safety risk.<n>We formulate SEPS as a constrained optimization problem where the agent aims to maximize an explicability score subject to constraints on safety.<n>We evaluate SEPS in safety-gym environments and with a physical robot experiment to show that it can learn explicable behaviors that adhere to the agent's safety requirements and are efficient.
arXiv Detail & Related papers (2025-03-10T20:52:41Z)
Fully Autonomous AI Agents Should Not be Developed [58.88624302082713]
This paper argues that fully autonomous AI agents should not be developed.<n>In support of this position, we build from prior scientific literature and current product marketing to delineate different AI agent levels.<n>Our analysis reveals that risks to people increase with the autonomy of a system.
arXiv Detail & Related papers (2025-02-04T19:00:06Z)
Autonomous Alignment with Human Value on Altruism through Considerate Self-imagination and Theory of Mind [7.19351244815121]
Altruistic behavior in human society originates from humans' capacity for empathizing others, known as Theory of Mind (ToM)<n>We are committed to endow agents with considerate self-imagination and ToM capabilities, driving them through implicit intrinsic motivations to autonomously align with human altruistic values.
arXiv Detail & Related papers (2024-12-31T07:31:46Z)
FlyAI -- The Next Level of Artificial Intelligence is Unpredictable! Injecting Responses of a Living Fly into Decision Making [6.694375709641935]
We introduce a new type of bionic AI that enhances decision-making unpredictability by incorporating responses from a living fly. Our approach uses a fly's varied reactions, to tune an AI agent in the game of Gobang.
arXiv Detail & Related papers (2024-09-30T17:19:59Z)
Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark [61.43264961005614]
We develop a benchmark of 134 Choose-Your-Own-Adventure games containing over half a million rich, diverse scenarios. We evaluate agents' tendencies to be power-seeking, cause disutility, and commit ethical violations. Our results show that agents can both act competently and morally, so concrete progress can be made in machine ethics.
arXiv Detail & Related papers (2023-04-06T17:59:03Z)
Modeling Moral Choices in Social Dilemmas with Multi-Agent Reinforcement Learning [4.2050490361120465]
A bottom-up learning approach may be more appropriate for studying and developing ethical behavior in AI agents. We present a systematic analysis of the choices made by intrinsically-motivated RL agents whose rewards are based on moral theories. We analyze the impact of different types of morality on the emergence of cooperation, defection or exploitation.
arXiv Detail & Related papers (2023-01-20T09:36:42Z)
When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment [96.77970239683475]
AI systems need to be able to understand, interpret and predict human moral judgments and decisions. A central challenge for AI safety is capturing the flexibility of the human moral mind. We present a novel challenge set consisting of rule-breaking question answering.
arXiv Detail & Related papers (2022-10-04T09:04:27Z)
Beyond Bayes-optimality: meta-learning what you know you don't know [27.941629748440224]
We show that risk- and ambiguity-sensitivity also emerge as the result of an optimization problem using modified meta-training algorithms. We empirically test our proposed meta-training algorithms on agents exposed to foundational classes of decision-making experiments.
arXiv Detail & Related papers (2022-09-30T17:44:01Z)
Towards Artificial Virtuous Agents: Games, Dilemmas and Machine Learning [4.864819846886143]
We show how a role-playing game can be designed to develop virtues within an artificial agent. We motivate the implementation of virtuous agents that play such role-playing games, and the examination of their decisions through a virtue ethical lens.
arXiv Detail & Related papers (2022-08-30T07:37:03Z)
On Avoiding Power-Seeking by Artificial Intelligence [93.9264437334683]
We do not know how to align a very intelligent AI agent's behavior with human interests. I investigate whether we can build smart AI agents which have limited impact on the world, and which do not autonomously seek power.
arXiv Detail & Related papers (2022-06-23T16:56:21Z)
CausalCity: Complex Simulations with Agency for Causal Discovery and Reasoning [68.74447489372037]
We present a high-fidelity simulation environment that is designed for developing algorithms for causal discovery and counterfactual reasoning. A core component of our work is to introduce textitagency, such that it is simple to define and create complex scenarios. We perform experiments with three state-of-the-art methods to create baselines and highlight the affordances of this environment.
arXiv Detail & Related papers (2021-06-25T00:21:41Z)
Immune Moral Models? Pro-Social Rule Breaking as a Moral Enhancement Approach for Ethical AI [0.17188280334580192]
Ethical behaviour is a critical characteristic that we would like in a human-centric AI. To make AI agents more human centric, we argue that there is a need for a mechanism that helps AI agents identify when to break rules.
arXiv Detail & Related papers (2021-06-17T18:44:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.