SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex
Interactive Tasks
- URL: http://arxiv.org/abs/2305.17390v2
- Date: Wed, 6 Dec 2023 10:07:01 GMT
- Title: SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex
Interactive Tasks
- Authors: Bill Yuchen Lin, Yicheng Fu, Karina Yang, Faeze Brahman, Shiyu Huang,
Chandra Bhagavatula, Prithviraj Ammanabrolu, Yejin Choi, Xiang Ren
- Abstract summary: We introduce SwiftSage, a novel agent framework inspired by the dual-process theory of human cognition.
The framework comprises two primary modules: the Swift module, representing fast and intuitive thinking, and the Sage module, emulating deliberate thought processes.
In 30 tasks from the ScienceWorld benchmark, SwiftSage significantly outperforms other methods such as SayCan, ReAct, and Reflex.
- Score: 81.9962823875981
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce SwiftSage, a novel agent framework inspired by the dual-process
theory of human cognition, designed to excel in action planning for complex
interactive reasoning tasks. SwiftSage integrates the strengths of behavior
cloning and prompting large language models (LLMs) to enhance task completion
performance. The framework comprises two primary modules: the Swift module,
representing fast and intuitive thinking, and the Sage module, emulating
deliberate thought processes. The Swift module is a small encoder-decoder LM
fine-tuned on the oracle agent's action trajectories, while the Sage module
employs LLMs such as GPT-4 for subgoal planning and grounding. We develop a
heuristic method to harmoniously integrate the two modules, resulting in a more
efficient and robust problem-solving process. In 30 tasks from the ScienceWorld
benchmark, SwiftSage significantly outperforms other methods such as SayCan,
ReAct, and Reflexion, demonstrating its effectiveness in solving complex
interactive tasks.
Related papers
- DynaThink: Fast or Slow? A Dynamic Decision-Making Framework for Large Language Models [42.95876831743256]
Large language models (LLMs) have demonstrated emergent capabilities across diverse reasoning tasks via Chains-of-Thought prompting.
This paper addresses the challenge of enabling LLMs to autonomously select between fast and slow inference methods.
We introduce a dynamic decision-making framework that categorizes tasks into two distinct pathways: 'Fast', designated for tasks where the LLM quickly identifies a high-confidence solution, and 'Slow', allocated for tasks that the LLM perceives as complex.
arXiv Detail & Related papers (2024-07-01T06:45:13Z) - APPL: A Prompt Programming Language for Harmonious Integration of Programs and Large Language Model Prompts [21.819126948549766]
Large Language Models (LLMs) have become increasingly capable of handling diverse tasks with the aid of well-crafted prompts.
APPL acts as a bridge between computer programs and LLMs, allowing seamless embedding of prompts into Python functions.
arXiv Detail & Related papers (2024-06-19T02:29:59Z) - RL-GPT: Integrating Reinforcement Learning and Code-as-policy [82.1804241891039]
We introduce a two-level hierarchical framework, RL-GPT, comprising a slow agent and a fast agent.
The slow agent analyzes actions suitable for coding, while the fast agent executes coding tasks.
This decomposition effectively focuses each agent on specific tasks, proving highly efficient within our pipeline.
arXiv Detail & Related papers (2024-02-29T16:07:22Z) - TDAG: A Multi-Agent Framework based on Dynamic Task Decomposition and
Agent Generation [45.028795422801764]
We propose a multi-agent framework based on dynamic Task Decomposition and Agent Generation (TDAG)
This framework dynamically decomposes complex tasks into smaller subtasks and assigns each to a specifically generated subagent.
ItineraryBench is designed to assess agents' abilities in memory, planning, and tool usage across tasks of varying complexity.
arXiv Detail & Related papers (2024-02-15T18:27:37Z) - LLMind: Orchestrating AI and IoT with LLM for Complex Task Execution [20.186752447895994]
We present LLMind, an AI agent framework that enables effective collaboration among IoT devices for executing complex tasks.
Inspired by the functional specialization theory of the brain, our framework integrates an LLM with domain-specific AI modules, enhancing its capabilities.
arXiv Detail & Related papers (2023-12-14T14:57:58Z) - Agent Lumos: Unified and Modular Training for Open-Source Language Agents [89.78556964988852]
We introduce LUMOS, one of the first frameworks for training open-source LLM-based agents.
LUMOS features a learnable, unified, and modular architecture with a planning module that learns high-level subgoal generation.
We collect large-scale, unified, and high-quality training annotations derived from diverse ground-truth reasoning rationales.
arXiv Detail & Related papers (2023-11-09T00:30:13Z) - CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules [51.82044734879657]
We propose CodeChain, a novel framework for inference that elicits modularized code generation through a chain of self-revisions.
We find that CodeChain can significantly boost both modularity as well as correctness of the generated solutions, achieving relative pass@1 improvements of 35% on APPS and 76% on CodeContests.
arXiv Detail & Related papers (2023-10-13T10:17:48Z) - Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models [31.509994889286183]
We introduce Language Agent Tree Search (LATS) -- the first general framework that synergizes the capabilities of language models (LMs) in reasoning, acting, and planning.
A key feature of our approach is the incorporation of an environment for external feedback, which offers a more deliberate and adaptive problem-solving mechanism.
LATS achieves state-of-the-art pass@1 accuracy (92.7%) for programming on HumanEval with GPT-4 and demonstrates gradient-free performance (average score of 75.9) comparable to gradient-based fine-tuning for web navigation on WebShop with GPT
arXiv Detail & Related papers (2023-10-06T17:55:11Z) - Low-code LLM: Graphical User Interface over Large Language Models [115.08718239772107]
This paper introduces a novel human-LLM interaction framework, Low-code LLM.
It incorporates six types of simple low-code visual programming interactions to achieve more controllable and stable responses.
We highlight three advantages of the low-code LLM: user-friendly interaction, controllable generation, and wide applicability.
arXiv Detail & Related papers (2023-04-17T09:27:40Z) - Decomposed Prompting: A Modular Approach for Solving Complex Tasks [55.42850359286304]
We propose Decomposed Prompting to solve complex tasks by decomposing them (via prompting) into simpler sub-tasks.
This modular structure allows each prompt to be optimized for its specific sub-task.
We show that the flexibility and modularity of Decomposed Prompting allows it to outperform prior work on few-shot prompting.
arXiv Detail & Related papers (2022-10-05T17:28:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.