Related papers: PerfGuard: A Performance-Aware Agent for Visual Content Generation

PerfGuard: A Performance-Aware Agent for Visual Content Generation

URL: http://arxiv.org/abs/2601.22571v1
Date: Fri, 30 Jan 2026 05:12:19 GMT
Title: PerfGuard: A Performance-Aware Agent for Visual Content Generation
Authors: Zhipeng Chen, Zhongrui Zhang, Chao Zhang, Yifan Xu, Lan Yang, Jun Liu, Ke Li, Yi-Zhe Song,
Abstract summary: PerfGuard is a performance-aware agent framework for visual content generation.<n>It integrates tool performance boundaries into task planning and scheduling.<n>It has advantages in tool selection accuracy, execution reliability, and alignment with user intent.
Score: 53.591105729011595
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The advancement of Large Language Model (LLM)-powered agents has enabled automated task processing through reasoning and tool invocation capabilities. However, existing frameworks often operate under the idealized assumption that tool executions are invariably successful, relying solely on textual descriptions that fail to distinguish precise performance boundaries and cannot adapt to iterative tool updates. This gap introduces uncertainty in planning and execution, particularly in domains like visual content generation (AIGC), where nuanced tool performance significantly impacts outcomes. To address this, we propose PerfGuard, a performance-aware agent framework for visual content generation that systematically models tool performance boundaries and integrates them into task planning and scheduling. Our framework introduces three core mechanisms: (1) Performance-Aware Selection Modeling (PASM), which replaces generic tool descriptions with a multi-dimensional scoring system based on fine-grained performance evaluations; (2) Adaptive Preference Update (APU), which dynamically optimizes tool selection by comparing theoretical rankings with actual execution rankings; and (3) Capability-Aligned Planning Optimization (CAPO), which guides the planner to generate subtasks aligned with performance-aware strategies. Experimental comparisons against state-of-the-art methods demonstrate PerfGuard's advantages in tool selection accuracy, execution reliability, and alignment with user intent, validating its robustness and practical utility for complex AIGC tasks. The project code is available at https://github.com/FelixChan9527/PerfGuard.

Related papers

ToolSelf: Unifying Task Execution and Self-Reconfiguration via Tool-Driven Intrinsic Adaptation [60.25542764389203]
Agentic systems powered by Large Language Models (LLMs) have demonstrated remarkable potential in tackling complex, long-horizon tasks.<n>Existing approaches, relying on manual orchestration or runtime-based patches, often struggle with poor generalization and fragmented optimization.<n>We propose ToolSelf, a novel paradigm enabling tool-driven self-readjustment.
arXiv Detail & Related papers (2026-02-08T09:27:18Z)
AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning [66.24374176797075]
We introduce textbfAdaReasoner, a family of multimodal models that learn tool use as a general reasoning skill rather than as tool-specific or explicitly supervised behavior.<n>AdaReasoner is enabled by (i) a scalable data curation pipeline exposing models to long-horizon, multi-step tool interactions; (ii) Tool-GRPO, a reinforcement learning algorithm that prioritizes tool selection and sequencing based on end-task success; and (iii) an adaptive learning mechanism that dynamically regulates tool usage.
arXiv Detail & Related papers (2026-01-26T16:04:43Z)
Jenius Agent: Towards Experience-Driven Accuracy Optimization in Real-World Scenarios [0.9069311779417014]
This paper introduces an agent framework grounded in real-world practical experience.<n>An end-to-end framework named Jenius-Agent has been integrated with three key optimizations.<n>Experiments show a 20 percent improvement in task accuracy, along with a reduced token cost, response latency, and invocation failures.
arXiv Detail & Related papers (2026-01-05T07:35:12Z)
ML-Tool-Bench: Tool-Augmented Planning for ML Tasks [23.54937738755734]
We introduce a benchmark for evaluating tool-augmented machine learning agents.<n>Our benchmark goes beyond traditional tool-use evaluation by incorporating an in-memory named object management.<n>Our approach improves over ReAct by 16.52 percentile positions, taking the median across all Kaggle challenges.
arXiv Detail & Related papers (2025-11-29T23:59:40Z)
AutoTool: Efficient Tool Selection for Large Language Model Agents [10.061664247482488]
Large Language Model (LLM) agents have emerged as powerful tools for automating complex tasks by leveraging the reasoning and decision-making abilities of LLMs.<n>However, a major bottleneck lies in the high inference cost of tool selection, especially in approaches like ReAct that repeatedly invoke the LLM to determine which tool to use at each step.<n>We propose AutoTool, a novel graph-based framework that bypasses repeated LLM inference by exploiting a key empirical observation: tool usage inertia.
arXiv Detail & Related papers (2025-11-18T16:41:48Z)
GAP: Graph-Based Agent Planning with Parallel Tool Use and Reinforcement Learning [20.75113227786218]
Graph-based Agent Planning (GAP) is a novel framework that explicitly models inter-task dependencies through graph-based planning.<n>Our approach trains agent foundation models to decompose complex tasks into dependency-aware sub-task graphs.<n>This dependency-aware orchestration achieves substantial improvements in both execution efficiency and task accuracy.
arXiv Detail & Related papers (2025-10-29T09:35:55Z)
Feedback-Driven Tool-Use Improvements in Large Language Models via Automated Build Environments [70.42705564227548]
We propose an automated environment construction pipeline for large language models (LLMs)<n>This enables the creation of high-quality training environments that provide detailed and measurable feedback without relying on external tools.<n>We also introduce a verifiable reward mechanism that evaluates both the precision of tool use and the completeness of task execution.
arXiv Detail & Related papers (2025-08-12T09:45:19Z)
Acting Less is Reasoning More! Teaching Model to Act Efficiently [87.28134636548705]
Tool-integrated reasoning augments large language models with the ability to invoke external tools to solve tasks.<n>Current approaches typically optimize only for final correctness without considering the efficiency or necessity of external tool use.<n>We propose a framework that encourages models to produce accurate answers with minimal tool calls.<n>Our approach reduces tool calls by up to 68.3% and improves tool productivity by up to 215.4%, while maintaining comparable answer accuracy.
arXiv Detail & Related papers (2025-04-21T05:40:05Z)
ToolACE-R: Model-aware Iterative Training and Adaptive Refinement for Tool Learning [84.69651852838794]
Tool learning allows Large Language Models (LLMs) to leverage external tools for solving complex user tasks.<n>We propose ToolACE-R, a novel framework that includes both model-aware iterative training and adaptive refinement for tool learning.<n>We conduct extensive experiments across several benchmark datasets, showing that ToolACE-R achieves competitive performance compared to advanced API-based models.
arXiv Detail & Related papers (2025-04-02T06:38:56Z)
ControlLLM: Augment Language Models with Tools by Searching on Graphs [97.62758830255002]
We present ControlLLM, a novel framework that enables large language models (LLMs) to utilize multi-modal tools for solving real-world tasks. Our framework comprises three key components: (1) a textittask decomposer that breaks down a complex task into clear subtasks with well-defined inputs and outputs; (2) a textitThoughts-on-Graph (ToG) paradigm that searches the optimal solution path on a pre-built tool graph; and (3) an textitexecution engine with a rich toolbox that interprets the solution path and runs the
arXiv Detail & Related papers (2023-10-26T21:57:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.