Related papers: FlowBench: Revisiting and Benchmarking Workflow-Guided Planning for LLM-based Agents

FlowBench: Revisiting and Benchmarking Workflow-Guided Planning for LLM-based Agents

URL: http://arxiv.org/abs/2406.14884v1
Date: Fri, 21 Jun 2024 06:13:00 GMT
Title: FlowBench: Revisiting and Benchmarking Workflow-Guided Planning for LLM-based Agents
Authors: Ruixuan Xiao, Wentao Ma, Ke Wang, Yuchuan Wu, Junbo Zhao, Haobo Wang, Fei Huang, Yongbin Li,
Abstract summary: We present FlowBench, the first benchmark for workflow-guided planning. FlowBench covers 51 different scenarios from 6 domains, with knowledge presented in diverse formats. Results indicate that current LLM agents need considerable improvements for satisfactory planning.
Score: 64.1759086221016
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: LLM-based agents have emerged as promising tools, which are crafted to fulfill complex tasks by iterative planning and action. However, these agents are susceptible to undesired planning hallucinations when lacking specific knowledge for expertise-intensive tasks. To address this, preliminary attempts are made to enhance planning reliability by incorporating external workflow-related knowledge. Despite the promise, such infused knowledge is mostly disorganized and diverse in formats, lacking rigorous formalization and comprehensive comparisons. Motivated by this, we formalize different formats of workflow knowledge and present FlowBench, the first benchmark for workflow-guided planning. FlowBench covers 51 different scenarios from 6 domains, with knowledge presented in diverse formats. To assess different LLMs on FlowBench, we design a multi-tiered evaluation framework. We evaluate the efficacy of workflow knowledge across multiple formats, and the results indicate that current LLM agents need considerable improvements for satisfactory planning. We hope that our challenging benchmark can pave the way for future agent planning research.

Related papers

Cross-Task Experiential Learning on LLM-based Multi-Agent Collaboration [63.90193684394165]
We introduce multi-agent cross-task experiential learning (MAEL), a novel framework that endows LLM-driven agents with explicit cross-task learning and experience accumulation.<n>During the experiential learning phase, we quantify the quality for each step in the task-solving workflow and store the resulting rewards.<n>During inference, agents retrieve high-reward, task-relevant experiences as few-shot examples to enhance the effectiveness of each reasoning step.
arXiv Detail & Related papers (2025-05-29T07:24:37Z)
Zero-shot Robotic Manipulation with Language-guided Instruction and Formal Task Planning [16.89900521727246]
We propose an innovative language-guided symbolic task planning (LM-SymOpt) framework with optimization. It is the first expert-free planning framework since we combine the world knowledge from Large Language Models with formal reasoning. Our experimental results show that LM-SymOpt outperforms existing LLM-based planning approaches.
arXiv Detail & Related papers (2025-01-25T13:33:22Z)
Flow: Modularized Agentic Workflow Automation [53.073598156915615]
Multi-agent frameworks powered by large language models (LLMs) have demonstrated great success in automated planning and task execution. However, the effective adjustment of agentic during execution has not been well studied. In this paper, we define an activity-on-vertex (AOV) graph, which allows continuous workflow refinement by agents. Our proposed multi-agent framework achieves efficient concurrent execution of subtasks, effective goal achievement, and enhanced error tolerance.
arXiv Detail & Related papers (2025-01-14T04:35:37Z)
Forecasting Application Counts in Talent Acquisition Platforms: Harnessing Multimodal Signals using LMs [5.7623855432001445]
We discuss a novel task in the recruitment domain, namely, application count forecasting. We show that existing auto-regressive based time series forecasting methods perform poorly for this task. We propose a multimodal LM-based model which fuses job-posting metadata of various modalities through a simple encoder.
arXiv Detail & Related papers (2024-11-19T01:18:32Z)
Benchmarking Agentic Workflow Generation [80.74757493266057]
We introduce WorFBench, a unified workflow generation benchmark with multi-faceted scenarios and intricate graph workflow structures. We also present WorFEval, a systemic evaluation protocol utilizing subsequence and subgraph matching algorithms. We observe that the generated can enhance downstream tasks, enabling them to achieve superior performance with less time during inference.
arXiv Detail & Related papers (2024-10-10T12:41:19Z)
Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning [53.6472920229013]
Large Language Models (LLMs) have demonstrated impressive capability in many natural language tasks. LLMs are prone to produce errors, hallucinations and inconsistent statements when performing multi-step reasoning. We introduce Q*, a framework for guiding LLMs decoding process with deliberative planning.
arXiv Detail & Related papers (2024-06-20T13:08:09Z)
Exploring and Benchmarking the Planning Capabilities of Large Language Models [57.23454975238014]
This work lays the foundations for improving planning capabilities of large language models (LLMs) We construct a comprehensive benchmark suite encompassing both classical planning benchmarks and natural language scenarios. We investigate the use of many-shot in-context learning to enhance LLM planning, exploring the relationship between increased context length and improved planning performance.
arXiv Detail & Related papers (2024-06-18T22:57:06Z)
FLAP: Flow-Adhering Planning with Constrained Decoding in LLMs [7.008135803030462]
Planning is a crucial task for agents in task oriented dialogs (TODs) Planning is a crucial task for agents in task oriented dialogs (TODs)
arXiv Detail & Related papers (2024-03-09T02:27:45Z)
KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents [54.09074527006576]
Large Language Models (LLMs) have demonstrated great potential in complex reasoning tasks, yet they fall short when tackling more sophisticated challenges. This inadequacy primarily stems from the lack of built-in action knowledge in language agents. We introduce KnowAgent, a novel approach designed to enhance the planning capabilities of LLMs by incorporating explicit action knowledge.
arXiv Detail & Related papers (2024-03-05T16:39:12Z)
Understanding the planning of LLM agents: A survey [98.82513390811148]
This survey provides the first systematic view of LLM-based agents planning, covering recent works aiming to improve planning ability. Comprehensive analyses are conducted for each direction, and further challenges in the field of research are discussed.
arXiv Detail & Related papers (2024-02-05T04:25:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.