Towards Full Delegation: Designing Ideal Agentic Behaviors for Travel Planning
- URL: http://arxiv.org/abs/2411.13904v1
- Date: Thu, 21 Nov 2024 07:30:02 GMT
- Title: Towards Full Delegation: Designing Ideal Agentic Behaviors for Travel Planning
- Authors: Song Jiang, Da JU, Andrew Cohen, Sasha Mitts, Aaron Foss, Justine T Kao, Xian Li, Yuandong Tian,
- Abstract summary: We propose APEC Agent Constitution, a list of criteria that an agent should follow for good agentic behaviors.
APEC-Travel is a travel planning agent that proactively extracts hidden personalized needs via multi-round dialog with travelers.
- Score: 49.34098402103427
- License:
- Abstract: How are LLM-based agents used in the future? While many of the existing work on agents has focused on improving the performance of a specific family of objective and challenging tasks, in this work, we take a different perspective by thinking about full delegation: agents take over humans' routine decision-making processes and are trusted by humans to find solutions that fit people's personalized needs and are adaptive to ever-changing context. In order to achieve such a goal, the behavior of the agents, i.e., agentic behaviors, should be evaluated not only on their achievements (i.e., outcome evaluation), but also how they achieved that (i.e., procedure evaluation). For this, we propose APEC Agent Constitution, a list of criteria that an agent should follow for good agentic behaviors, including Accuracy, Proactivity, Efficiency and Credibility. To verify whether APEC aligns with human preferences, we develop APEC-Travel, a travel planning agent that proactively extracts hidden personalized needs via multi-round dialog with travelers. APEC-Travel is constructed purely from synthetic data generated by Llama3.1-405B-Instruct with a diverse set of travelers' persona to simulate rich distribution of dialogs. Iteratively fine-tuned to follow APEC Agent Constitution, APEC-Travel surpasses baselines by 20.7% on rule-based metrics and 9.1% on LLM-as-a-Judge scores across the constitution axes.
Related papers
- AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive Scenarios [38.878966229688054]
We introduce AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive Scenarios.
Drawing on Dramaturgical Theory, AgentSense employs a bottom-up approach to create 1,225 diverse social scenarios constructed from extensive scripts.
We analyze goals using ERG theory and conduct comprehensive experiments.
Our findings highlight that LLMs struggle with goals in complex social scenarios, especially high-level growth needs, and even GPT-4o requires improvement in private information reasoning.
arXiv Detail & Related papers (2024-10-25T07:04:16Z) - Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance [95.03771007780976]
We tackle the challenge of developing proactive agents capable of anticipating and initiating tasks without explicit human instructions.
First, we collect real-world human activities to generate proactive task predictions.
These predictions are labeled by human annotators as either accepted or rejected.
The labeled data is used to train a reward model that simulates human judgment.
arXiv Detail & Related papers (2024-10-16T08:24:09Z) - Agent-as-a-Judge: Evaluate Agents with Agents [61.33974108405561]
We introduce the Agent-as-a-Judge framework, wherein agentic systems are used to evaluate agentic systems.
This is an organic extension of the LLM-as-a-Judge framework, incorporating agentic features that enable intermediate feedback for the entire task-solving process.
We present DevAI, a new benchmark of 55 realistic automated AI development tasks.
arXiv Detail & Related papers (2024-10-14T17:57:02Z) - PersonaGym: Evaluating Persona Agents and LLMs [47.75926334294358]
We introduce PersonaGym, the first dynamic evaluation framework for assessing persona agents, and PersonaScore, the first automated human-aligned metric grounded in decision theory.
Our evaluation of 6 open and closed-source LLMs, using a benchmark encompassing 200 personas and 10,000 questions, reveals significant opportunities for advancement in persona agent capabilities.
arXiv Detail & Related papers (2024-07-25T22:24:45Z) - Select to Perfect: Imitating desired behavior from large multi-agent data [28.145889065013687]
Desired characteristics for AI agents can be expressed by assigning desirability scores.
We first assess the effect of each individual agent's behavior on the collective desirability score.
We propose the concept of an agent's Exchange Value, which quantifies an individual agent's contribution to the collective desirability score.
arXiv Detail & Related papers (2024-05-06T15:48:24Z) - AgentCF: Collaborative Learning with Autonomous Language Agents for
Recommender Systems [112.76941157194544]
We propose AgentCF for simulating user-item interactions in recommender systems through agent-based collaborative filtering.
We creatively consider not only users but also items as agents, and develop a collaborative learning approach that optimize both kinds of agents together.
Overall, the optimized agents exhibit diverse interaction behaviors within our framework, including user-item, user-user, item-item, and collective interactions.
arXiv Detail & Related papers (2023-10-13T16:37:14Z) - The Rise and Potential of Large Language Model Based Agents: A Survey [91.71061158000953]
Large language models (LLMs) are regarded as potential sparks for Artificial General Intelligence (AGI)
We start by tracing the concept of agents from its philosophical origins to its development in AI, and explain why LLMs are suitable foundations for agents.
We explore the extensive applications of LLM-based agents in three aspects: single-agent scenarios, multi-agent scenarios, and human-agent cooperation.
arXiv Detail & Related papers (2023-09-14T17:12:03Z) - Investigating Agency of LLMs in Human-AI Collaboration Tasks [24.562034082480608]
We build on social-cognitive theory to develop a framework of features through which Agency is expressed in dialogue.
We collect a new dataset of 83 human-human collaborative interior design conversations.
arXiv Detail & Related papers (2023-05-22T08:17:14Z) - Formalizing the Problem of Side Effect Regularization [81.97441214404247]
We propose a formal criterion for side effect regularization via the assistance game framework.
In these games, the agent solves a partially observable Markov decision process.
We show that this POMDP is solved by trading off the proxy reward with the agent's ability to achieve a range of future tasks.
arXiv Detail & Related papers (2022-06-23T16:36:13Z) - "I Don't Think So": Disagreement-Based Policy Summaries for Comparing
Agents [2.6270468656705765]
We propose a novel method for generating contrastive summaries that highlight the differences between agent's policies.
Our results show that the novel disagreement-based summaries lead to improved user performance compared to summaries generated using HIGHLIGHTS.
arXiv Detail & Related papers (2021-02-05T09:09:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.