Autonomous Alignment with Human Value on Altruism through Considerate Self-imagination and Theory of Mind
- URL: http://arxiv.org/abs/2501.00320v2
- Date: Tue, 07 Jan 2025 09:25:32 GMT
- Title: Autonomous Alignment with Human Value on Altruism through Considerate Self-imagination and Theory of Mind
- Authors: Haibo Tong, Enmeng Lu, Yinqian Sun, Zhengqiang Han, Chao Liu, Feifei Zhao, Yi Zeng,
- Abstract summary: Altruistic behavior in human society originates from humans' capacity for empathizing others, known as Theory of Mind (ToM)
We are committed to endow agents with considerate self-imagination and ToM capabilities, driving them through implicit intrinsic motivations to autonomously align with human altruistic values.
- Score: 7.19351244815121
- License:
- Abstract: With the widespread application of Artificial Intelligence (AI) in human society, enabling AI to autonomously align with human values has become a pressing issue to ensure its sustainable development and benefit to humanity. One of the most important aspects of aligning with human values is the necessity for agents to autonomously make altruistic, safe, and ethical decisions, considering and caring for human well-being. Current AI extremely pursues absolute superiority in certain tasks, remaining indifferent to the surrounding environment and other agents, which has led to numerous safety risks. Altruistic behavior in human society originates from humans' capacity for empathizing others, known as Theory of Mind (ToM), combined with predictive imaginative interactions before taking action to produce thoughtful and altruistic behaviors. Inspired by this, we are committed to endow agents with considerate self-imagination and ToM capabilities, driving them through implicit intrinsic motivations to autonomously align with human altruistic values. By integrating ToM within the imaginative space, agents keep an eye on the well-being of other agents in real time, proactively anticipate potential risks to themselves and others, and make thoughtful altruistic decisions that balance negative effects on the environment. The ancient Chinese story of Sima Guang Smashes the Vat illustrates the moral behavior of the young Sima Guang smashed a vat to save a child who had accidentally fallen into it, which is an excellent reference scenario for this paper. We design an experimental scenario similar to Sima Guang Smashes the Vat and its variants with different complexities, which reflects the trade-offs and comprehensive considerations between self-goals, altruistic rescue, and avoiding negative side effects.
Related papers
- Fully Autonomous AI Agents Should Not be Developed [58.88624302082713]
This paper argues that fully autonomous AI agents should not be developed.
In support of this position, we build from prior scientific literature and current product marketing to delineate different AI agent levels.
Our analysis reveals that risks to people increase with the autonomy of a system.
arXiv Detail & Related papers (2025-02-04T19:00:06Z) - Building Altruistic and Moral AI Agent with Brain-inspired Affective Empathy Mechanisms [7.3650155128839225]
This paper is dedicated to autonomously driving intelligent agents to acquire morally behaviors through human-like affective empathy mechanisms.
Based on the principle of moral utilitarianism, we design the moral reward function that integrates intrinsic empathy and extrinsic self-task goals.
arXiv Detail & Related papers (2024-10-29T09:19:27Z) - Combining Theory of Mind and Kindness for Self-Supervised Human-AI Alignment [0.0]
Current AI models prioritize task optimization over safety, leading to risks of unintended harm.
We propose a novel human-inspired approach which aims to address these various concerns and help align competing objectives.
arXiv Detail & Related papers (2024-10-21T22:04:44Z) - FairMindSim: Alignment of Behavior, Emotion, and Belief in Humans and LLM Agents Amid Ethical Dilemmas [23.26678104324838]
We introduced FairMindSim, which simulates the moral dilemma through a series of unfair scenarios.
We used LLM agents to simulate human behavior, ensuring alignment across various stages.
Our findings indicate that, behaviorally, GPT-4o exhibits a stronger sense of social justice, while humans display a richer range of emotions.
arXiv Detail & Related papers (2024-10-14T11:39:05Z) - Agent Assessment of Others Through the Lens of Self [1.223779595809275]
The paper argues that the quality of an autonomous agent's introspective capabilities of self are crucial in mirroring quality human-like understandings of other agents.
Ultimately, the vision set forth is not merely of machines that compute but of entities that introspect, empathize, and understand.
arXiv Detail & Related papers (2023-12-18T17:15:04Z) - Fairness in AI and Its Long-Term Implications on Society [68.8204255655161]
We take a closer look at AI fairness and analyze how lack of AI fairness can lead to deepening of biases over time.
We discuss how biased models can lead to more negative real-world outcomes for certain groups.
If the issues persist, they could be reinforced by interactions with other risks and have severe implications on society in the form of social unrest.
arXiv Detail & Related papers (2023-04-16T11:22:59Z) - When to Make Exceptions: Exploring Language Models as Accounts of Human
Moral Judgment [96.77970239683475]
AI systems need to be able to understand, interpret and predict human moral judgments and decisions.
A central challenge for AI safety is capturing the flexibility of the human moral mind.
We present a novel challenge set consisting of rule-breaking question answering.
arXiv Detail & Related papers (2022-10-04T09:04:27Z) - Aligning to Social Norms and Values in Interactive Narratives [89.82264844526333]
We focus on creating agents that act in alignment with socially beneficial norms and values in interactive narratives or text-based games.
We introduce the GALAD agent that uses the social commonsense knowledge present in specially trained language models to contextually restrict its action space to only those actions that are aligned with socially beneficial values.
arXiv Detail & Related papers (2022-05-04T09:54:33Z) - Trustworthy AI: A Computational Perspective [54.80482955088197]
We focus on six of the most crucial dimensions in achieving trustworthy AI: (i) Safety & Robustness, (ii) Non-discrimination & Fairness, (iii) Explainability, (iv) Privacy, (v) Accountability & Auditability, and (vi) Environmental Well-Being.
For each dimension, we review the recent related technologies according to a taxonomy and summarize their applications in real-world systems.
arXiv Detail & Related papers (2021-07-12T14:21:46Z) - AGENT: A Benchmark for Core Psychological Reasoning [60.35621718321559]
Intuitive psychology is the ability to reason about hidden mental variables that drive observable actions.
Despite recent interest in machine agents that reason about other agents, it is not clear if such agents learn or hold the core psychology principles that drive human reasoning.
We present a benchmark consisting of procedurally generated 3D animations, AGENT, structured around four scenarios.
arXiv Detail & Related papers (2021-02-24T14:58:23Z) - Human Perception of Intrinsically Motivated Autonomy in Human-Robot
Interaction [2.485182034310304]
A challenge in using robots in human-inhabited environments is to design behavior that is engaging, yet robust to the perturbations induced by human interaction.
Our idea is to imbue the robot with intrinsic motivation (IM) so that it can handle new situations and appears as a genuine social other to humans.
This article presents a "robotologist" study design that allows comparing autonomously generated behaviors with each other.
arXiv Detail & Related papers (2020-02-14T09:49:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.