Towards Unified Alignment Between Agents, Humans, and Environment
- URL: http://arxiv.org/abs/2402.07744v2
- Date: Wed, 14 Feb 2024 18:43:54 GMT
- Title: Towards Unified Alignment Between Agents, Humans, and Environment
- Authors: Zonghan Yang, An Liu, Zijun Liu, Kaiming Liu, Fangzhou Xiong, Yile
Wang, Zeyuan Yang, Qingyuan Hu, Xinrui Chen, Zhenhe Zhang, Fuwen Luo,
Zhicheng Guo, Peng Li, Yang Liu
- Abstract summary: We introduce the principles of $mathbfUA2$, which advocate for the simultaneous alignment of agents with human intentions, environmental dynamics, and self-constraints.
We conduct proof-of-concept studies by introducing realistic features to WebShop, including user profiles to demonstrate intentions, personalized reranking for complex environmental dynamics, and runtime cost statistics to reflect self-constraints.
- Score: 24.731978646069
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The rapid progress of foundation models has led to the prosperity of
autonomous agents, which leverage the universal capabilities of foundation
models to conduct reasoning, decision-making, and environmental interaction.
However, the efficacy of agents remains limited when operating in intricate,
realistic environments. In this work, we introduce the principles of
$\mathbf{U}$nified $\mathbf{A}$lignment for $\mathbf{A}$gents
($\mathbf{UA}^2$), which advocate for the simultaneous alignment of agents with
human intentions, environmental dynamics, and self-constraints such as the
limitation of monetary budgets. From the perspective of $\mathbf{UA}^2$, we
review the current agent research and highlight the neglected factors in
existing agent benchmarks and method candidates. We also conduct
proof-of-concept studies by introducing realistic features to WebShop,
including user profiles to demonstrate intentions, personalized reranking for
complex environmental dynamics, and runtime cost statistics to reflect
self-constraints. We then follow the principles of $\mathbf{UA}^2$ to propose
an initial design of our agent, and benchmark its performance with several
candidate baselines in the retrofitted WebShop. The extensive experimental
results further prove the importance of the principles of $\mathbf{UA}^2$. Our
research sheds light on the next steps of autonomous agent research with
improved general problem-solving abilities.
Related papers
- Metareasoning in uncertain environments: a meta-BAMDP framework [1.0923877073891441]
Finding the right $P$ can itself be framed as an optimization problem over the space of reasoning processes $P$.
This paper proposes a meta Bayes-Adaptive MDP framework to handle metareasoning in environments with unknown reward/transition distributions.
arXiv Detail & Related papers (2024-08-02T13:15:01Z) - Estimating and Incentivizing Imperfect-Knowledge Agents with Hidden
Rewards [4.742123770879715]
In practice, incentive providers often cannot observe the reward realizations of incentivized agents.
This paper explores a repeated adverse selection game between a self-interested learning agent and a learning principal.
We introduce an estimator whose only input is the history of principal's incentives and agent's choices.
arXiv Detail & Related papers (2023-08-13T08:12:01Z) - Inverse Reinforcement Learning with the Average Reward Criterion [3.719493310637464]
We study the problem of Inverse Reinforcement Learning (IRL) with an average-reward criterion.
The goal is to recover an unknown policy and a reward function when the agent only has samples of states and actions from an experienced agent.
arXiv Detail & Related papers (2023-05-24T01:12:08Z) - When Demonstrations Meet Generative World Models: A Maximum Likelihood
Framework for Offline Inverse Reinforcement Learning [62.00672284480755]
This paper aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent.
Accurate models of expertise in executing a task has applications in safety-sensitive applications such as clinical decision making and autonomous driving.
arXiv Detail & Related papers (2023-02-15T04:14:20Z) - Multi-Agent Neural Rewriter for Vehicle Routing with Limited Disclosure
of Costs [65.23158435596518]
Solving the multi-vehicle routing problem as a team Markov game with partially observable costs.
Our multi-agent reinforcement learning approach, the so-called multi-agent Neural Rewriter, builds on the single-agent Neural Rewriter to solve the problem by iteratively rewriting solutions.
arXiv Detail & Related papers (2022-06-13T09:17:40Z) - Modeling Bounded Rationality in Multi-Agent Simulations Using Rationally
Inattentive Reinforcement Learning [85.86440477005523]
We study more human-like RL agents which incorporate an established model of human-irrationality, the Rational Inattention (RI) model.
RIRL models the cost of cognitive information processing using mutual information.
We show that using RIRL yields a rich spectrum of new equilibrium behaviors that differ from those found under rational assumptions.
arXiv Detail & Related papers (2022-01-18T20:54:00Z) - On Reward-Free RL with Kernel and Neural Function Approximations:
Single-Agent MDP and Markov Game [140.19656665344917]
We study the reward-free RL problem, where an agent aims to thoroughly explore the environment without any pre-specified reward function.
We tackle this problem under the context of function approximation, leveraging powerful function approximators.
We establish the first provably efficient reward-free RL algorithm with kernel and neural function approximators.
arXiv Detail & Related papers (2021-10-19T07:26:33Z) - CausalCity: Complex Simulations with Agency for Causal Discovery and
Reasoning [68.74447489372037]
We present a high-fidelity simulation environment that is designed for developing algorithms for causal discovery and counterfactual reasoning.
A core component of our work is to introduce textitagency, such that it is simple to define and create complex scenarios.
We perform experiments with three state-of-the-art methods to create baselines and highlight the affordances of this environment.
arXiv Detail & Related papers (2021-06-25T00:21:41Z) - Iterative Feature Matching: Toward Provable Domain Generalization with
Logarithmic Environments [55.24895403089543]
Domain generalization aims at performing well on unseen test environments with data from a limited number of training environments.
We present a new algorithm based on performing iterative feature matching that is guaranteed with high probability to yield a predictor that generalizes after seeing only $O(logd_s)$ environments.
arXiv Detail & Related papers (2021-06-18T04:39:19Z) - Present-Biased Optimization [8.775878711928591]
The paper extends the framework proposed by Akerlof (1991) for studying various aspects of human behavior related to time-inconsistent planning.
We show that the ratio between the cost of the solutions computed by present-biased agents and the cost of the optimal solutions may differ significantly depending on the problem constraints.
arXiv Detail & Related papers (2020-12-29T12:40:59Z) - Relational-Grid-World: A Novel Relational Reasoning Environment and An
Agent Model for Relational Information Extraction [0.0]
Reinforcement learning (RL) agents are often designed specifically for a particular problem and they generally have uninterpretable working processes.
Statistical methods-based RL algorithms can be improved in terms of generalizability and interpretability using symbolic Artificial Intelligence (AI) tools such as logic programming.
We present a model-free RL architecture that is supported with explicit relational representations of the environmental objects.
arXiv Detail & Related papers (2020-07-12T11:30:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.