A Survey on Complex Tasks for Goal-Directed Interactive Agents
- URL: http://arxiv.org/abs/2409.18538v1
- Date: Fri, 27 Sep 2024 08:17:53 GMT
- Title: A Survey on Complex Tasks for Goal-Directed Interactive Agents
- Authors: Mareike Hartmann, Alexander Koller,
- Abstract summary: This survey compiles relevant tasks and environments for evaluating goal-directed interactive agents.
An up-to-date compilation of relevant resources can be found on our project website.
- Score: 60.53915548970061
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Goal-directed interactive agents, which autonomously complete tasks through interactions with their environment, can assist humans in various domains of their daily lives. Recent advances in large language models (LLMs) led to a surge of new, more and more challenging tasks to evaluate such agents. To properly contextualize performance across these tasks, it is imperative to understand the different challenges they pose to agents. To this end, this survey compiles relevant tasks and environments for evaluating goal-directed interactive agents, structuring them along dimensions relevant for understanding current obstacles. An up-to-date compilation of relevant resources can be found on our project website: https://coli-saar.github.io/interactive-agents.
Related papers
- Interactive Agents to Overcome Ambiguity in Software Engineering [61.40183840499932]
AI agents are increasingly being deployed to automate tasks, often based on ambiguous and underspecified user instructions.
Making unwarranted assumptions and failing to ask clarifying questions can lead to suboptimal outcomes.
We study the ability of LLM agents to handle ambiguous instructions in interactive code generation settings by evaluating proprietary and open-weight models on their performance.
arXiv Detail & Related papers (2025-02-18T17:12:26Z) - TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks [52.46737975742287]
We build a self-contained environment with data that mimics a small software company environment.
We find that with the most competitive agent, 24% of the tasks can be completed autonomously.
This paints a nuanced picture on task automation with LM agents.
arXiv Detail & Related papers (2024-12-18T18:55:40Z) - ReSpAct: Harmonizing Reasoning, Speaking, and Acting Towards Building Large Language Model-Based Conversational AI Agents [11.118991548784459]
Large language model (LLM)-based agents have been increasingly used to interact with external environments.
Current frameworks do not enable these agents to work with users and interact with them to align on the details of their tasks.
This work introduces ReSpAct, a novel framework that combines the essential skills for building task-oriented "conversational" agents.
arXiv Detail & Related papers (2024-11-01T15:57:45Z) - IDAT: A Multi-Modal Dataset and Toolkit for Building and Evaluating Interactive Task-Solving Agents [20.460482488872145]
This paper addresses the challenges of developing interactive agents capable of understanding and executing grounded natural language instructions.
We introduce a scalable data collection tool for gathering interactive grounded language instructions within a Minecraft-like environment.
We present a Human-in-the-Loop interactive evaluation platform for qualitative analysis and comparison of agent performance.
arXiv Detail & Related papers (2024-07-12T00:07:43Z) - AntEval: Evaluation of Social Interaction Competencies in LLM-Driven
Agents [65.16893197330589]
Large Language Models (LLMs) have demonstrated their ability to replicate human behaviors across a wide range of scenarios.
However, their capability in handling complex, multi-character social interactions has yet to be fully explored.
We introduce the Multi-Agent Interaction Evaluation Framework (AntEval), encompassing a novel interaction framework and evaluation methods.
arXiv Detail & Related papers (2024-01-12T11:18:00Z) - Multi-Agent Consensus Seeking via Large Language Models [6.336670103502898]
Multi-agent systems driven by large language models (LLMs) have shown promising abilities for solving complex tasks in a collaborative manner.
This work considers a fundamental problem in multi-agent collaboration: consensus seeking.
arXiv Detail & Related papers (2023-10-31T03:37:11Z) - WebArena: A Realistic Web Environment for Building Autonomous Agents [92.3291458543633]
We build an environment for language-guided agents that is highly realistic and reproducible.
We focus on agents that perform tasks on the web, and create an environment with fully functional websites from four common domains.
We release a set of benchmark tasks focusing on evaluating the functional correctness of task completions.
arXiv Detail & Related papers (2023-07-25T22:59:32Z) - Establishing Shared Query Understanding in an Open Multi-Agent System [1.2031796234206138]
We propose a method that allows to develop shared understanding between two agents for the purpose of performing a task that requires cooperation.
Our method focuses on efficiently establishing successful task-oriented communication in an open multi-agent system.
arXiv Detail & Related papers (2023-05-16T11:07:05Z) - LEMMA: A Multi-view Dataset for Learning Multi-agent Multi-task
Activities [119.88381048477854]
We introduce the LEMMA dataset to provide a single home to address missing dimensions with meticulously designed settings.
We densely annotate the atomic-actions with human-object interactions to provide ground-truths of the compositionality, scheduling, and assignment of daily activities.
We hope this effort would drive the machine vision community to examine goal-directed human activities and further study the task scheduling and assignment in the real world.
arXiv Detail & Related papers (2020-07-31T00:13:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.