ComfyMind: Toward General-Purpose Generation via Tree-Based Planning and Reactive Feedback
- URL: http://arxiv.org/abs/2505.17908v1
- Date: Fri, 23 May 2025 13:53:03 GMT
- Title: ComfyMind: Toward General-Purpose Generation via Tree-Based Planning and Reactive Feedback
- Authors: Litao Guo, Xinli Xu, Luozhou Wang, Jiantao Lin, Jinsong Zhou, Zixin Zhang, Bolan Su, Ying-Cong Chen,
- Abstract summary: We present ComfyMind, a collaborative AI system designed to enable robust and scalable general-purpose generation.<n>ComfyMind introduces two core innovations: Semantic Interface (SWI) that abstracts low-level node graphs into callable functional language, and Search Tree Planning mechanism.<n>We evaluate ComfyMind on three public benchmarks: ComfyBench, GenEval, and Reason-Edit, which span generation, editing, and reasoning tasks.
- Score: 15.363560226232668
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the rapid advancement of generative models, general-purpose generation has gained increasing attention as a promising approach to unify diverse tasks across modalities within a single system. Despite this progress, existing open-source frameworks often remain fragile and struggle to support complex real-world applications due to the lack of structured workflow planning and execution-level feedback. To address these limitations, we present ComfyMind, a collaborative AI system designed to enable robust and scalable general-purpose generation, built on the ComfyUI platform. ComfyMind introduces two core innovations: Semantic Workflow Interface (SWI) that abstracts low-level node graphs into callable functional modules described in natural language, enabling high-level composition and reducing structural errors; Search Tree Planning mechanism with localized feedback execution, which models generation as a hierarchical decision process and allows adaptive correction at each stage. Together, these components improve the stability and flexibility of complex generative workflows. We evaluate ComfyMind on three public benchmarks: ComfyBench, GenEval, and Reason-Edit, which span generation, editing, and reasoning tasks. Results show that ComfyMind consistently outperforms existing open-source baselines and achieves performance comparable to GPT-Image-1. ComfyMind paves a promising path for the development of open-source general-purpose generative AI systems. Project page: https://github.com/LitaoGuo/ComfyMind
Related papers
- Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation [65.30763239365928]
We introduce Genie Envisioner (GE), a unified world foundation platform for robotic manipulation.<n>GE integrates policy learning, evaluation, and simulation within a single video-generative framework.
arXiv Detail & Related papers (2025-08-07T17:59:44Z) - Assemble Your Crew: Automatic Multi-agent Communication Topology Design via Autoregressive Graph Generation [72.44384066166147]
Multi-agent systems (MAS) based on large language models (LLMs) have emerged as a powerful solution for dealing with complex problems across diverse domains.<n>Existing approaches are fundamentally constrained by their reliance on a template graph modification paradigm with a predefined set of agents and hard-coded interaction structures.<n>We propose ARG-Designer, a novel autoregressive model that operationalizes this paradigm by constructing the collaboration graph from scratch.
arXiv Detail & Related papers (2025-07-24T09:17:41Z) - GenerationPrograms: Fine-grained Attribution with Executable Programs [72.23792263905372]
We introduce a modular generation framework, GenerationPrograms, inspired by recent advancements in "code agent" architectures.<n>GenerationPrograms decomposes the process into two distinct stages: first, creating an executable program plan composed of modular text operations explicitly tailored to the query, and second, executing these operations following the program's specified instructions to produce the final response.<n> Empirical evaluations demonstrate that GenerationPrograms significantly improves attribution quality at both the document level and sentence level.
arXiv Detail & Related papers (2025-06-17T14:37:09Z) - Aggregated Structural Representation with Large Language Models for Human-Centric Layout Generation [7.980497203230983]
We propose an Aggregation Structural Representation (ASR) module that integrates graph networks with large language models (LLMs) to preserve structural information while enhancing generative capability.<n>A comprehensive evaluation on the RICO dataset demonstrates the strong performance of ASR, both quantitatively using mean Intersection over Union (mIoU) and qualitatively through a crowdsourced user study.
arXiv Detail & Related papers (2025-05-26T06:17:21Z) - MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering [57.156093929365255]
Gym-style framework for systematically reinforcement learning, evaluating, and improving autonomous large language model (LLM) agents.<n>MLE-Dojo covers diverse, open-ended MLE tasks carefully curated to reflect realistic engineering scenarios.<n>Its fully executable environment supports comprehensive agent training via both supervised fine-tuning and reinforcement learning.
arXiv Detail & Related papers (2025-05-12T17:35:43Z) - Flow State: Humans Enabling AI Systems to Program Themselves [0.24578723416255752]
We introduce Pocketflow, a platform centered on Human-AI co-design.<n>Pocketflow is a Python framework built upon a deliberately minimal yet synergistic set of core abstractions.<n>It provides a robust, vendor-agnostic foundation with very little code that demonstrably reduces overhead.
arXiv Detail & Related papers (2025-04-03T05:25:46Z) - InvFussion: Bridging Supervised and Zero-shot Diffusion for Inverse Problems [76.39776789410088]
This work introduces a framework that combines the strong performance of supervised approaches and the flexibility of zero-shot methods.<n>A novel architectural design seamlessly integrates the degradation operator directly into the denoiser.<n> Experimental results on the FFHQ and ImageNet datasets demonstrate state-of-the-art posterior-sampling performance.
arXiv Detail & Related papers (2025-04-02T12:40:57Z) - RGL: A Graph-Centric, Modular Framework for Efficient Retrieval-Augmented Generation on Graphs [58.10503898336799]
We introduce the RAG-on-Graphs Library (RGL), a modular framework that seamlessly integrates the complete RAG pipeline.<n>RGL addresses key challenges by supporting a variety of graph formats and integrating optimized implementations for essential components.<n>Our evaluations demonstrate that RGL not only accelerates the prototyping process but also enhances the performance and applicability of graph-based RAG systems.
arXiv Detail & Related papers (2025-03-25T03:21:48Z) - Multi-Objective Bayesian Optimization for Networked Black-Box Systems: A Path to Greener Profits and Smarter Designs [0.0]
MOBONS is a novel Bayesian optimization-inspired algorithm that can efficiently optimize general function networks.<n>We demonstrate the effectiveness of MOBONS through two case studies, including one related to sustainable process design.
arXiv Detail & Related papers (2025-02-19T21:49:05Z) - ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems [80.69865295743149]
This work attempts to study using LLM-based agents to design collaborative AI systems autonomously.<n>Based on ComfyBench, we develop ComfyAgent, a framework that empowers agents to autonomously design collaborative AI systems by generating.<n>While ComfyAgent achieves a comparable resolve rate to o1-preview and significantly surpasses other agents on ComfyBench, ComfyAgent has resolved only 15% of creative tasks.
arXiv Detail & Related papers (2024-09-02T17:44:10Z) - Effective Reinforcement Learning Based on Structural Information Principles [19.82391136775341]
We propose a novel and general Structural Information principles-based framework for effective Decision-Making, namely SIDM.
SIDM can be flexibly incorporated into various single-agent and multi-agent RL algorithms, enhancing their performance.
arXiv Detail & Related papers (2024-04-15T13:02:00Z) - DepGraph: Towards Any Structural Pruning [68.40343338847664]
We study general structural pruning of arbitrary architecture like CNNs, RNNs, GNNs and Transformers.
We propose a general and fully automatic method, emphDependency Graph (DepGraph), to explicitly model the dependency between layers and comprehensively group parameters for pruning.
In this work, we extensively evaluate our method on several architectures and tasks, including ResNe(X)t, DenseNet, MobileNet and Vision transformer for images, GAT for graph, DGCNN for 3D point cloud, alongside LSTM for language, and demonstrate that, even with a
arXiv Detail & Related papers (2023-01-30T14:02:33Z) - Bayesian Prompt Learning for Image-Language Model Generalization [64.50204877434878]
We use the regularization ability of Bayesian methods to frame prompt learning as a variational inference problem.
Our approach regularizes the prompt space, reduces overfitting to the seen prompts and improves the prompt generalization on unseen prompts.
We demonstrate empirically on 15 benchmarks that Bayesian prompt learning provides an appropriate coverage of the prompt space.
arXiv Detail & Related papers (2022-10-05T17:05:56Z) - Neural Transition System for End-to-End Opinion Role Labeling [13.444895891262844]
Unified opinion role labeling (ORL) aims to detect all possible opinion structures of opinion-holder-target' in one shot, given a text.
We propose a novel solution by revisiting the transition architecture, and augment it with a pointer network (PointNet)
The framework parses out all opinion structures in linear-time complexity, breaks through the limitation of any length of terms with PointNet.
arXiv Detail & Related papers (2021-10-05T12:45:59Z) - House-GAN++: Generative Adversarial Layout Refinement Networks [37.60108582423617]
Our architecture is an integration of a graph-constrained GAN and a conditional GAN, where a previously generated layout becomes the next input constraint.
A surprising discovery of our research is that a simple non-iterative training process, dubbed component-wise GT-conditioning, is effective in learning such a generator.
arXiv Detail & Related papers (2021-03-03T18:15:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.