Learning to Solve Complex Tasks by Talking to Agents
- URL: http://arxiv.org/abs/2110.08542v1
- Date: Sat, 16 Oct 2021 10:37:34 GMT
- Title: Learning to Solve Complex Tasks by Talking to Agents
- Authors: Tushar Khot and Kyle Richardson and Daniel Khashabi and Ashish
Sabharwal
- Abstract summary: Humans often solve complex problems by interacting with existing agents, such as AI assistants, that can solve simpler sub-tasks.
Common NLP benchmarks aim for the development of self-sufficient models for every task.
We propose a new benchmark called CommaQA that contains three kinds of complex reasoning tasks designed to be solved by talking'' to four agents with different capabilities.
- Score: 39.08818632689814
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Humans often solve complex problems by interacting (in natural language) with
existing agents, such as AI assistants, that can solve simpler sub-tasks. These
agents themselves can be powerful systems built using extensive resources and
privately held data. In contrast, common NLP benchmarks aim for the development
of self-sufficient models for every task. To address this gap and facilitate
research towards ``green'' AI systems that build upon existing agents, we
propose a new benchmark called CommaQA that contains three kinds of complex
reasoning tasks that are designed to be solved by ``talking'' to four agents
with different capabilities. We demonstrate that state-of-the-art black-box
models, which are unable to leverage existing agents, struggle on CommaQA
(exact match score only reaches 40pts) even when given access to the agents'
internal knowledge and gold fact supervision. On the other hand, models using
gold question decomposition supervision can indeed solve CommaQA to a high
accuracy (over 96\% exact match) by learning to utilize the agents. Even these
additional supervision models, however, do not solve our compositional
generalization test set. Finally the end-goal of learning to solve complex
tasks by communicating with existing agents \emph{without relying on any
additional supervision} remains unsolved and we hope CommaQA serves as a novel
benchmark to enable the development of such systems.
Related papers
- Sibyl: Simple yet Effective Agent Framework for Complex Real-world Reasoning [12.80689911863731]
Sibyl is a powerful framework designed to tackle complex reasoning tasks by efficiently leveraging a minimal set of tools.
Sibyl implements a multi-agent debate-based jury to self-refine the final answers, ensuring a comprehensive and balanced approach.
Our experimental results on the GAIA benchmark test set reveal that the Sibyl agent achieves state-of-the-art performance with an average score of 34.55%.
arXiv Detail & Related papers (2024-07-15T13:45:40Z) - Adaptive In-conversation Team Building for Language Model Agents [33.03550687362213]
Leveraging multiple large language model (LLM) agents has shown to be a promising approach for tackling complex tasks.
Our new adaptive team-building paradigm offers a flexible solution, realized through a novel agent design named Captain Agent.
arXiv Detail & Related papers (2024-05-29T18:08:37Z) - Smurfs: Leveraging Multiple Proficiency Agents with Context-Efficiency for Tool Planning [14.635361844362794]
Smurfs' is a cutting-edge multi-agent framework designed to revolutionize the application of large language models.
Smurfs can enhance the model's ability to solve complex tasks at no additional cost.
arXiv Detail & Related papers (2024-05-09T17:49:04Z) - Enhancing the General Agent Capabilities of Low-Parameter LLMs through Tuning and Multi-Branch Reasoning [56.82041895921434]
Open-source pre-trained Large Language Models (LLMs) exhibit strong language understanding and generation capabilities.
When used as agents for dealing with complex problems in the real world, their performance is far inferior to large commercial models such as ChatGPT and GPT-4.
arXiv Detail & Related papers (2024-03-29T03:48:12Z) - KwaiAgents: Generalized Information-seeking Agent System with Large
Language Models [33.59597020276034]
Humans excel in critical thinking, planning, reflection, and harnessing available tools to interact with and interpret the world.
Recent advancements in large language models (LLMs) suggest that machines might also possess the aforementioned human-like capabilities.
We introduce KwaiAgents, a generalized information-seeking agent system based on LLMs.
arXiv Detail & Related papers (2023-12-08T08:11:11Z) - Multi-Agent Consensus Seeking via Large Language Models [6.922356864800498]
Multi-agent systems driven by large language models (LLMs) have shown promising abilities for solving complex tasks in a collaborative manner.
This work considers a fundamental problem in multi-agent collaboration: consensus seeking.
arXiv Detail & Related papers (2023-10-31T03:37:11Z) - On the Complexity of Multi-Agent Decision Making: From Learning in Games
to Partial Monitoring [105.13668993076801]
A central problem in the theory of multi-agent reinforcement learning (MARL) is to understand what structural conditions and algorithmic principles lead to sample-efficient learning guarantees.
We study this question in a general framework for interactive decision making with multiple agents.
We show that characterizing the statistical complexity for multi-agent decision making is equivalent to characterizing the statistical complexity of single-agent decision making.
arXiv Detail & Related papers (2023-05-01T06:46:22Z) - Towards Collaborative Question Answering: A Preliminary Study [63.91687114660126]
We propose CollabQA, a novel QA task in which several expert agents coordinated by a moderator work together to answer questions that cannot be answered with any single agent alone.
We make a synthetic dataset of a large knowledge graph that can be distributed to experts.
We show that the problem can be challenging without introducing prior to the collaboration structure, unless experts are perfect and uniform.
arXiv Detail & Related papers (2022-01-24T14:27:00Z) - On the Use and Misuse of Absorbing States in Multi-agent Reinforcement
Learning [55.95253619768565]
Current MARL algorithms assume that the number of agents within a group remains fixed throughout an experiment.
In many practical problems, an agent may terminate before their teammates.
We present a novel architecture for an existing state-of-the-art MARL algorithm which uses attention instead of a fully connected layer with absorbing states.
arXiv Detail & Related papers (2021-11-10T23:45:08Z) - UneVEn: Universal Value Exploration for Multi-Agent Reinforcement
Learning [53.73686229912562]
We propose a novel MARL approach called Universal Value Exploration (UneVEn)
UneVEn learns a set of related tasks simultaneously with a linear decomposition of universal successor features.
Empirical results on a set of exploration games, challenging cooperative predator-prey tasks requiring significant coordination among agents, and StarCraft II micromanagement benchmarks show that UneVEn can solve tasks where other state-of-the-art MARL methods fail.
arXiv Detail & Related papers (2020-10-06T19:08:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.