Social Contract AI: Aligning AI Assistants with Implicit Group Norms
- URL: http://arxiv.org/abs/2310.17769v2
- Date: Sun, 3 Dec 2023 17:42:33 GMT
- Title: Social Contract AI: Aligning AI Assistants with Implicit Group Norms
- Authors: Jan-Philipp Fr\"anken, Sam Kwok, Peixuan Ye, Kanishk Gandhi, Dilip
Arumugam, Jared Moore, Alex Tamkin, Tobias Gerstenberg, Noah D. Goodman
- Abstract summary: We explore the idea of aligning an AI assistant by inverting a model of users' (unknown) preferences from observed interactions.
We run proof-of-concept simulations in the economic ultimatum game, formalizing user preferences as policies that guide the actions of simulated players.
- Score: 37.68821926786935
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We explore the idea of aligning an AI assistant by inverting a model of
users' (unknown) preferences from observed interactions. To validate our
proposal, we run proof-of-concept simulations in the economic ultimatum game,
formalizing user preferences as policies that guide the actions of simulated
players. We find that the AI assistant accurately aligns its behavior to match
standard policies from the economic literature (e.g., selfish, altruistic).
However, the assistant's learned policies lack robustness and exhibit limited
generalization in an out-of-distribution setting when confronted with a
currency (e.g., grams of medicine) that was not included in the assistant's
training distribution. Additionally, we find that when there is inconsistency
in the relationship between language use and an unknown policy (e.g., an
altruistic policy combined with rude language), the assistant's learning of the
policy is slowed. Overall, our preliminary results suggest that developing
simulation frameworks in which AI assistants need to infer preferences from
diverse users can provide a valuable approach for studying practical alignment
questions.
Related papers
- CURATe: Benchmarking Personalised Alignment of Conversational AI Assistants [5.7605009639020315]
Assessment of ten leading models across five scenarios (each with 337 use cases)
Key failure modes include inappropriate weighing of conflicting preferences, sycophancy, a lack of attentiveness to critical user information within the context window, and inconsistent application of user-specific knowledge.
We propose research directions for embedding self-reflection capabilities, online user modelling, and dynamic risk assessment in AI assistants.
arXiv Detail & Related papers (2024-10-28T15:59:31Z) - Uncovering Deceptive Tendencies in Language Models: A Simulated Company AI Assistant [0.7856916351510368]
We study the tendency of AI systems to deceive by constructing a realistic simulation setting of a company AI assistant.
We introduce situations where the model might be inclined to behave deceptively, while taking care to not instruct or otherwise pressure the model to do so.
Our work demonstrates that even models trained to be helpful, harmless and honest sometimes behave deceptively in realistic scenarios, without notable external pressure to do so.
arXiv Detail & Related papers (2024-04-25T17:29:53Z) - The Ethics of Advanced AI Assistants [53.89899371095332]
This paper focuses on the opportunities and the ethical and societal risks posed by advanced AI assistants.
We define advanced AI assistants as artificial agents with natural language interfaces, whose function is to plan and execute sequences of actions on behalf of a user.
We consider the deployment of advanced assistants at a societal scale, focusing on cooperation, equity and access, misinformation, economic impact, the environment and how best to evaluate advanced AI assistants.
arXiv Detail & Related papers (2024-04-24T23:18:46Z) - Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning [50.47568731994238]
Key method for creating Artificial Intelligence (AI) agents is Reinforcement Learning (RL)
This paper presents a general framework model for integrating and learning structured reasoning into AI agents' policies.
arXiv Detail & Related papers (2023-12-22T17:57:57Z) - Residual Q-Learning: Offline and Online Policy Customization without
Value [53.47311900133564]
Imitation Learning (IL) is a widely used framework for learning imitative behavior from demonstrations.
We formulate a new problem setting called policy customization.
We propose a novel framework, Residual Q-learning, which can solve the formulated MDP by leveraging the prior policy.
arXiv Detail & Related papers (2023-06-15T22:01:19Z) - Deconfounding Imitation Learning with Variational Inference [19.99248795957195]
Standard imitation learning can fail when the expert demonstrators have different sensory inputs than the imitating agent.
This is because partial observability gives rise to hidden confounders in the causal graph.
We propose to train a variational inference model to infer the expert's latent information and use it to train a latent-conditional policy.
arXiv Detail & Related papers (2022-11-04T18:00:02Z) - Aligning Artificial Intelligence with Humans through Public Policy [0.0]
This essay outlines research on AI that learn structures in policy data that can be leveraged for downstream tasks.
We believe this represents the "comprehension" phase of AI and policy, but leveraging policy as a key source of human values to align AI requires "understanding" policy.
arXiv Detail & Related papers (2022-06-25T21:31:14Z) - Should Machine Learning Models Report to Us When They Are Clueless? [0.0]
We report that AI models extrapolate outside their range of familiar data.
Knowing whether a model has extrapolated or not is a fundamental insight that should be included in explaining AI models.
arXiv Detail & Related papers (2022-03-23T01:50:24Z) - Building a Foundation for Data-Driven, Interpretable, and Robust Policy
Design using the AI Economist [67.08543240320756]
We show that the AI Economist framework enables effective, flexible, and interpretable policy design using two-level reinforcement learning and data-driven simulations.
We find that log-linear policies trained using RL significantly improve social welfare, based on both public health and economic outcomes, compared to past outcomes.
arXiv Detail & Related papers (2021-08-06T01:30:41Z) - The AI Economist: Improving Equality and Productivity with AI-Driven Tax
Policies [119.07163415116686]
We train social planners that discover tax policies that can effectively trade-off economic equality and productivity.
We present an economic simulation environment that features competitive pressures and market dynamics.
We show that AI-driven tax policies improve the trade-off between equality and productivity by 16% over baseline policies.
arXiv Detail & Related papers (2020-04-28T06:57:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.