Related papers: Aligning AI Agents via Information-Directed Sampling

Aligning AI Agents via Information-Directed Sampling

URL: http://arxiv.org/abs/2410.14807v1
Date: Fri, 18 Oct 2024 18:23:41 GMT
Title: Aligning AI Agents via Information-Directed Sampling
Authors: Hong Jun Jeon, Benjamin Van Roy,
Abstract summary: bandit alignment problem involves maximizing long-run expected reward by interacting with an environment and a human. We study these trade-offs theoretically and empirically in a toy bandit alignment problem which resembles the beta-Bernoulli bandit. We demonstrate while naive exploration algorithms which reflect current practices and even touted algorithms such as Thompson sampling both fail to provide acceptable solutions to this problem.
Score: 20.617552198581024
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The staggering feats of AI systems have brought to attention the topic of AI Alignment: aligning a "superintelligent" AI agent's actions with humanity's interests. Many existing frameworks/algorithms in alignment study the problem on a myopic horizon or study learning from human feedback in isolation, relying on the contrived assumption that the agent has already perfectly identified the environment. As a starting point to address these limitations, we define a class of bandit alignment problems as an extension of classic multi-armed bandit problems. A bandit alignment problem involves an agent tasked with maximizing long-run expected reward by interacting with an environment and a human, both involving details/preferences initially unknown to the agent. The reward of actions in the environment depends on both observed outcomes and human preferences. Furthermore, costs are associated with querying the human to learn preferences. Therefore, an effective agent ought to intelligently trade-off exploration (of the environment and human) and exploitation. We study these trade-offs theoretically and empirically in a toy bandit alignment problem which resembles the beta-Bernoulli bandit. We demonstrate while naive exploration algorithms which reflect current practices and even touted algorithms such as Thompson sampling both fail to provide acceptable solutions to this problem, information-directed sampling achieves favorable regret.

Related papers

FAIRTOPIA: Envisioning Multi-Agent Guardianship for Disrupting Unfair AI Pipelines [1.556153237434314]
AI models have become active decision makers, often acting without human supervision.<n>We envision agents as fairness guardians, since agents learn from their environment.<n>We introduce a fairness-by-design approach which embeds multi-role agents in an end-to-end (human to AI) synergetic scheme.
arXiv Detail & Related papers (2025-06-10T17:02:43Z)
Neurodivergent Influenceability as a Contingent Solution to the AI Alignment Problem [1.3905735045377272]
The AI alignment problem, which focusses on ensuring that artificial intelligence (AI) systems act according to human values, presents profound challenges.<n>With the progression from narrow AI to Artificial General Intelligence (AGI) and Superintelligence, fears about control and existential risk have escalated.<n>Here, we investigate whether embracing inevitable AI misalignment can be a contingent strategy to foster a dynamic ecosystem of competing agents.
arXiv Detail & Related papers (2025-05-05T11:33:18Z)
Among Us: A Sandbox for Agentic Deception [1.1893676124374688]
Among Us is a text-based social-deduction game environment. LLM-agents exhibit human-style deception naturally while they think, speak, and act with other agents or humans. We evaluate the effectiveness of AI safety techniques for detecting lying and deception in Among Us.
arXiv Detail & Related papers (2025-04-05T06:09:32Z)
Human aversion? Do AI Agents Judge Identity More Harshly Than Performance [0.06554326244334868]
We investigate how AI agents based on large language models assess and integrate human input. We find that the AI system systematically discounts human advice, penalizing human errors more severely than algorithmic errors.
arXiv Detail & Related papers (2025-03-31T02:05:27Z)
Towards Principled Unsupervised Multi-Agent Reinforcement Learning [49.533774397707056]
We present a scalable, decentralized, trust-region policy search algorithm to address the problem in practical settings.<n>We show that optimizing for a specific objective, namely mixture entropy, provides an excellent trade-off between tractability and performances.
arXiv Detail & Related papers (2025-02-12T12:51:36Z)
Quantifying Misalignment Between Agents: Towards a Sociotechnical Understanding of Alignment [2.619545850602691]
Recent sociotechnical approaches highlight the need to understand complex misalignment among multiple human and AI agents. We adapt a computational social science model of human contention to the alignment problem. Our model quantifies misalignment in large, diverse agent groups with potentially conflicting goals.
arXiv Detail & Related papers (2024-06-06T16:31:22Z)
Online Decision Mediation [72.80902932543474]
Consider learning a decision support assistant to serve as an intermediary between (oracle) expert behavior and (imperfect) human behavior. In clinical diagnosis, fully-autonomous machine behavior is often beyond ethical affordances.
arXiv Detail & Related papers (2023-10-28T05:59:43Z)
Pure Exploration under Mediators' Feedback [63.56002444692792]
Multi-armed bandits are a sequential-decision-making framework, where, at each interaction step, the learner selects an arm and observes a reward. We consider the scenario in which the learner has access to a set of mediators, each of which selects the arms on the agent's behalf according to a and possibly unknown policy. We propose a sequential decision-making strategy for discovering the best arm under the assumption that the mediators' policies are known to the learner.
arXiv Detail & Related papers (2023-08-29T18:18:21Z)
Bandit Social Learning: Exploration under Myopic Behavior [58.75758600464338]
We study social learning dynamics motivated by reviews on online platforms. Agents collectively follow a simple multi-armed bandit protocol, but each agent acts myopically, without regards to exploration. We derive stark learning failures for any such behavior, and provide matching positive results.
arXiv Detail & Related papers (2023-02-15T01:57:57Z)
Blessing from Human-AI Interaction: Super Reinforcement Learning in Confounded Environments [19.944163846660498]
We introduce the paradigm of super reinforcement learning that takes advantage of Human-AI interaction for data driven sequential decision making. In the decision process with unmeasured confounding, the actions taken by past agents can offer valuable insights into undisclosed information. We develop several super-policy learning algorithms and systematically study their theoretical properties.
arXiv Detail & Related papers (2022-09-29T16:03:07Z)
Emergence of Novelty in Evolutionary Algorithms [0.0]
We introduce our approach to the maze problem and compare it to the previously proposed solution. We find that our solution leads to an improved performance while being significantly simpler. Building on that, we generalize the problem and apply our approach to a more advanced set of tasks, Atari Games.
arXiv Detail & Related papers (2022-06-27T13:49:41Z)
On Avoiding Power-Seeking by Artificial Intelligence [93.9264437334683]
We do not know how to align a very intelligent AI agent's behavior with human interests. I investigate whether we can build smart AI agents which have limited impact on the world, and which do not autonomously seek power.
arXiv Detail & Related papers (2022-06-23T16:56:21Z)
Incentivizing Combinatorial Bandit Exploration [87.08827496301839]
Consider a bandit algorithm that recommends actions to self-interested users in a recommendation system. Users are free to choose other actions and need to be incentivized to follow the algorithm's recommendations. While the users prefer to exploit, the algorithm can incentivize them to explore by leveraging the information collected from the previous users.
arXiv Detail & Related papers (2022-06-01T13:46:25Z)
End-to-End Learning and Intervention in Games [60.41921763076017]
We provide a unified framework for learning and intervention in games. We propose two approaches, respectively based on explicit and implicit differentiation. The analytical results are validated using several real-world problems.
arXiv Detail & Related papers (2020-10-26T18:39:32Z)
Latent Bandits Revisited [55.88616813182679]
A latent bandit problem is one in which the learning agent knows the arm reward distributions conditioned on an unknown discrete latent state. We propose general algorithms for this setting, based on both upper confidence bounds (UCBs) and Thompson sampling. We provide a unified theoretical analysis of our algorithms, which have lower regret than classic bandit policies when the number of latent states is smaller than actions.
arXiv Detail & Related papers (2020-06-15T19:24:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.