BayesFlow: A Probability Inference Framework for Meta-Agent Assisted Workflow Generation
- URL: http://arxiv.org/abs/2601.22305v1
- Date: Thu, 29 Jan 2026 20:43:20 GMT
- Title: BayesFlow: A Probability Inference Framework for Meta-Agent Assisted Workflow Generation
- Authors: Bo Yuan, Yun Zhou, Zhichao Xu, Kiran Ramnath, Aosong Feng, Balasubramaniam Srinivasan,
- Abstract summary: We introduce textbfBayesian Generation (BWG), a sampling framework that builds step-by-step using parallel look-ahead rollouts for importance weighting.<n>We prove that, without the refiner, the weighted empirical distribution converges to the target posterior.<n>BayesFlow improves accuracy by up to 9 percentage points over SOTA workflow generation baselines and by up to 65 percentage points over zero-shot prompting.
- Score: 12.637030045464693
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automatic workflow generation is the process of automatically synthesizing sequences of LLM calls, tool invocations, and post-processing steps for complex end-to-end tasks. Most prior methods cast this task as an optimization problem with limited theoretical grounding. We propose to cast workflow generation as Bayesian inference over a posterior distribution on workflows, and introduce \textbf{Bayesian Workflow Generation (BWG)}, a sampling framework that builds workflows step-by-step using parallel look-ahead rollouts for importance weighting and a sequential in-loop refiner for pool-wide improvements. We prove that, without the refiner, the weighted empirical distribution converges to the target posterior. We instantiate BWG as \textbf{BayesFlow}, a training-free algorithm for workflow construction. Across six benchmark datasets, BayesFlow improves accuracy by up to 9 percentage points over SOTA workflow generation baselines and by up to 65 percentage points over zero-shot prompting, establishing BWG as a principled upgrade to search-based workflow design. Code will be available on https://github.com/BoYuanVisionary/BayesFlow.
Related papers
- Learning to Compose for Cross-domain Agentic Workflow Generation [56.630382886594184]
We create an open-source LLM for cross-domain workflow generation.<n>We learn a compact set of reusable workflow capabilities across diverse domains.<n>Our 1-pass generator surpasses SOTA refinement baselines that consume 20 iterations.
arXiv Detail & Related papers (2026-02-11T18:27:22Z) - Do We Always Need Query-Level Workflows? Rethinking Agentic Workflow Generation for Multi-Agent Systems [72.3575737073235]
Multi-Agent Systems (MAS) solve complex tasks by coordinating multiple agents through.<n>Existing approaches generates either at task level or query level, but their relative costs and benefits remain unclear.<n>We show that query-level workflow generation is not always necessary, since a small set of top-K best task-level together already covers equivalent or even more queries.
arXiv Detail & Related papers (2026-01-16T10:05:51Z) - ComfyGPT: A Self-Optimizing Multi-Agent System for Comprehensive ComfyUI Workflow Generation [96.44354750396019]
ComfyGPT is a self-optimizing multi-agent system designed to generate ComfyUI based on task descriptions automatically.<n> FlowDataset is a large-scale dataset containing 13,571 workflow-description pairs.<n>FlowBench is a benchmark for evaluating workflow generation systems.
arXiv Detail & Related papers (2025-03-22T06:48:50Z) - Cognify: Supercharging Gen-AI Workflows With Hierarchical Autotuning [6.328780056857816]
gen-AI that involve multiple ML model calls, tool/API calls, data retrieval, or generic code execution are often tuned manually in an ad-hoc way.<n>AdaSeek organizes workflow tuning methods into different layers based on the user-specified total search budget.<n>Cognify improves these workflow's generation quality by up to 2.8x, reduces execution monetary cost by up to 10x, and reduces end-to-end latency by 2.7x.
arXiv Detail & Related papers (2025-02-12T01:36:27Z) - ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization [51.280919773837645]
We develop ScoreFlow, a high-performance framework for agent workflow optimization.<n>ScoreFlow incorporates Score-DPO, a novel variant of the direct preference optimization method that accounts for quantitative feedback.<n>It achieves an 8.2% improvement over existing baselines across question answering, coding, and mathematical reasoning.
arXiv Detail & Related papers (2025-02-06T18:47:49Z) - Opus: A Large Work Model for Complex Workflow Generation [0.0]
Opus is a framework for generating and optimizing tasks tailored to complex Business Process Outsourcing (BPO) use cases.<n>Our approach generates executables from Intention, defined as the alignment of Client Input, Client Output and Process Directed Context.
arXiv Detail & Related papers (2024-11-30T20:00:41Z) - WorkflowLLM: Enhancing Workflow Orchestration Capability of Large Language Models [105.46456444315693]
We presentLLM, a data-centric framework to enhance the capability of large language models in workflow orchestration.
It first constructs a large-scale fine-tuningBench with 106,763 samples, covering 1,503 APIs from 83 applications across 28 categories.
LlamaLlama demonstrates a strong capacity to orchestrate complex APIs, while also achieving notable generalization performance.
arXiv Detail & Related papers (2024-11-08T09:58:02Z) - AFlow: Automating Agentic Workflow Generation [36.61172223528231]
Large language models (LLMs) have demonstrated remarkable potential in solving complex tasks across diverse domains.<n>We introduce AFlow, an automated framework that efficiently explores this space using Monte Carlo Tree Search.<n> Empirical evaluations across six benchmark datasets demonstrate AFlow's efficacy, yielding a 5.7% average improvement over state-of-the-art baselines.
arXiv Detail & Related papers (2024-10-14T17:40:40Z) - Benchmarking Agentic Workflow Generation [80.74757493266057]
We introduce WorfBench, a unified workflow generation benchmark with multi-faceted scenarios and intricate graph workflow structures.<n>We also present WorfEval, a systemic evaluation protocol utilizing subsequence and subgraph matching algorithms.<n>We observe that the generated can enhance downstream tasks, enabling them to achieve superior performance with less time during inference.
arXiv Detail & Related papers (2024-10-10T12:41:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.