Related papers: BayesFlow: A Probability Inference Framework for Meta-Agent Assisted Workflow Generation

BayesFlow: A Probability Inference Framework for Meta-Agent Assisted Workflow Generation

URL: http://arxiv.org/abs/2601.22305v1
Date: Thu, 29 Jan 2026 20:43:20 GMT
Title: BayesFlow: A Probability Inference Framework for Meta-Agent Assisted Workflow Generation
Authors: Bo Yuan, Yun Zhou, Zhichao Xu, Kiran Ramnath, Aosong Feng, Balasubramaniam Srinivasan,
Abstract summary: We introduce textbfBayesian Generation (BWG), a sampling framework that builds step-by-step using parallel look-ahead rollouts for importance weighting.<n>We prove that, without the refiner, the weighted empirical distribution converges to the target posterior.<n>BayesFlow improves accuracy by up to 9 percentage points over SOTA workflow generation baselines and by up to 65 percentage points over zero-shot prompting.
Score: 12.637030045464693
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Automatic workflow generation is the process of automatically synthesizing sequences of LLM calls, tool invocations, and post-processing steps for complex end-to-end tasks. Most prior methods cast this task as an optimization problem with limited theoretical grounding. We propose to cast workflow generation as Bayesian inference over a posterior distribution on workflows, and introduce \textbf{Bayesian Workflow Generation (BWG)}, a sampling framework that builds workflows step-by-step using parallel look-ahead rollouts for importance weighting and a sequential in-loop refiner for pool-wide improvements. We prove that, without the refiner, the weighted empirical distribution converges to the target posterior. We instantiate BWG as \textbf{BayesFlow}, a training-free algorithm for workflow construction. Across six benchmark datasets, BayesFlow improves accuracy by up to 9 percentage points over SOTA workflow generation baselines and by up to 65 percentage points over zero-shot prompting, establishing BWG as a principled upgrade to search-based workflow design. Code will be available on https://github.com/BoYuanVisionary/BayesFlow.

Related papers

Learning to Compose for Cross-domain Agentic Workflow Generation [56.630382886594184]
We create an open-source LLM for cross-domain workflow generation.<n>We learn a compact set of reusable workflow capabilities across diverse domains.<n>Our 1-pass generator surpasses SOTA refinement baselines that consume 20 iterations.
arXiv Detail & Related papers (2026-02-11T18:27:22Z)
Do We Always Need Query-Level Workflows? Rethinking Agentic Workflow Generation for Multi-Agent Systems [72.3575737073235]
Multi-Agent Systems (MAS) solve complex tasks by coordinating multiple agents through.<n>Existing approaches generates either at task level or query level, but their relative costs and benefits remain unclear.<n>We show that query-level workflow generation is not always necessary, since a small set of top-K best task-level together already covers equivalent or even more queries.
arXiv Detail & Related papers (2026-01-16T10:05:51Z)
ComfyGPT: A Self-Optimizing Multi-Agent System for Comprehensive ComfyUI Workflow Generation [96.44354750396019]
ComfyGPT is a self-optimizing multi-agent system designed to generate ComfyUI based on task descriptions automatically.<n> FlowDataset is a large-scale dataset containing 13,571 workflow-description pairs.<n>FlowBench is a benchmark for evaluating workflow generation systems.
arXiv Detail & Related papers (2025-03-22T06:48:50Z)
Cognify: Supercharging Gen-AI Workflows With Hierarchical Autotuning [6.328780056857816]
gen-AI that involve multiple ML model calls, tool/API calls, data retrieval, or generic code execution are often tuned manually in an ad-hoc way.<n>AdaSeek organizes workflow tuning methods into different layers based on the user-specified total search budget.<n>Cognify improves these workflow's generation quality by up to 2.8x, reduces execution monetary cost by up to 10x, and reduces end-to-end latency by 2.7x.
arXiv Detail & Related papers (2025-02-12T01:36:27Z)
ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization [51.280919773837645]
We develop ScoreFlow, a high-performance framework for agent workflow optimization.<n>ScoreFlow incorporates Score-DPO, a novel variant of the direct preference optimization method that accounts for quantitative feedback.<n>It achieves an 8.2% improvement over existing baselines across question answering, coding, and mathematical reasoning.
arXiv Detail & Related papers (2025-02-06T18:47:49Z)
Opus: A Large Work Model for Complex Workflow Generation [0.0]
Opus is a framework for generating and optimizing tasks tailored to complex Business Process Outsourcing (BPO) use cases.<n>Our approach generates executables from Intention, defined as the alignment of Client Input, Client Output and Process Directed Context.
arXiv Detail & Related papers (2024-11-30T20:00:41Z)
WorkflowLLM: Enhancing Workflow Orchestration Capability of Large Language Models [105.46456444315693]
We presentLLM, a data-centric framework to enhance the capability of large language models in workflow orchestration. It first constructs a large-scale fine-tuningBench with 106,763 samples, covering 1,503 APIs from 83 applications across 28 categories. LlamaLlama demonstrates a strong capacity to orchestrate complex APIs, while also achieving notable generalization performance.
arXiv Detail & Related papers (2024-11-08T09:58:02Z)
AFlow: Automating Agentic Workflow Generation [36.61172223528231]
Large language models (LLMs) have demonstrated remarkable potential in solving complex tasks across diverse domains.<n>We introduce AFlow, an automated framework that efficiently explores this space using Monte Carlo Tree Search.<n> Empirical evaluations across six benchmark datasets demonstrate AFlow's efficacy, yielding a 5.7% average improvement over state-of-the-art baselines.
arXiv Detail & Related papers (2024-10-14T17:40:40Z)
Benchmarking Agentic Workflow Generation [80.74757493266057]
We introduce WorfBench, a unified workflow generation benchmark with multi-faceted scenarios and intricate graph workflow structures.<n>We also present WorfEval, a systemic evaluation protocol utilizing subsequence and subgraph matching algorithms.<n>We observe that the generated can enhance downstream tasks, enabling them to achieve superior performance with less time during inference.
arXiv Detail & Related papers (2024-10-10T12:41:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.