Evolving Excellence: Automated Optimization of LLM-based Agents
- URL: http://arxiv.org/abs/2512.09108v1
- Date: Tue, 09 Dec 2025 20:48:45 GMT
- Title: Evolving Excellence: Automated Optimization of LLM-based Agents
- Authors: Paul Brookes, Vardan Voskanyan, Rafail Giavrimis, Matthew Truscott, Mina Ilieva, Chrystalla Pavlou, Alexandru Staicu, Manal Adham, Will Evers- Hood, Jingzhi Gong, Kejia Zhang, Matvey Fedoseev, Vishal Sharma, Roman Bauer, Zheng Wang, Hema Nair, Wei Jie, Tianhua Xu, Aurora Constantin, Leslie Kanthan, Michail Basios,
- Abstract summary: We present ARTEMIS, a no-code evolutionary optimization platform that jointly optimize agent configurations through semantically-aware genetic operators.<n>We evaluate ARTEMIS on four representative agent systems: the emphALE Agent for competitive programming on AtCoder Heuristic Contest, achieving a textbf$13.6%$ improvement in acceptance rate.<n>We also evaluate the emphMathTales-Teacher Agent powered by a smaller open-source model (Qwen2.5-7B) on GSM8K primary-level mathematics problems, achieving a textbf
- Score: 33.81822162934331
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Agentic AI systems built on large language models (LLMs) offer significant potential for automating complex workflows, from software development to customer support. However, LLM agents often underperform due to suboptimal configurations; poorly tuned prompts, tool descriptions, and parameters that typically require weeks of manual refinement. Existing optimization methods either are too complex for general use or treat components in isolation, missing critical interdependencies. We present ARTEMIS, a no-code evolutionary optimization platform that jointly optimizes agent configurations through semantically-aware genetic operators. Given only a benchmark script and natural language goals, ARTEMIS automatically discovers configurable components, extracts performance signals from execution logs, and evolves configurations without requiring architectural modifications. We evaluate ARTEMIS on four representative agent systems: the \emph{ALE Agent} for competitive programming on AtCoder Heuristic Contest, achieving a \textbf{$13.6\%$ improvement} in acceptance rate; the \emph{Mini-SWE Agent} for code optimization on SWE-Perf, with a statistically significant \textbf{10.1\% performance gain}; and the \emph{CrewAI Agent} for cost and mathematical reasoning on Math Odyssey, achieving a statistically significant \textbf{$36.9\%$ reduction} in the number of tokens required for evaluation. We also evaluate the \emph{MathTales-Teacher Agent} powered by a smaller open-source model (Qwen2.5-7B) on GSM8K primary-level mathematics problems, achieving a \textbf{22\% accuracy improvement} and demonstrating that ARTEMIS can optimize agents based on both commercial and local models.
Related papers
- ComAgent: Multi-LLM based Agentic AI Empowered Intelligent Wireless Networks [62.031889234230725]
6G networks rely on complex cross-layer optimization.<n> manually translating high-level intents into mathematical formulations remains a bottleneck.<n>We present ComAgent, a multi-LLM agentic AI framework.
arXiv Detail & Related papers (2026-01-27T13:43:59Z) - SimuAgent: An LLM-Based Simulink Modeling Assistant Enhanced with Reinforcement Learning [3.1436750864792375]
We introduce SimuAgent, an LLM-powered modeling and simulation agent tailored for Simulink.<n>SimuAgent replaces XML with a concise, dictionary-style Python representation, dramatically cutting token counts.<n>A lightweight plan-execute architecture, trained in two stages, equips the agent with both low-level tool skills and high-level design reasoning.
arXiv Detail & Related papers (2026-01-08T18:10:35Z) - Jenius Agent: Towards Experience-Driven Accuracy Optimization in Real-World Scenarios [0.9069311779417014]
This paper introduces an agent framework grounded in real-world practical experience.<n>An end-to-end framework named Jenius-Agent has been integrated with three key optimizations.<n>Experiments show a 20 percent improvement in task accuracy, along with a reduced token cost, response latency, and invocation failures.
arXiv Detail & Related papers (2026-01-05T07:35:12Z) - Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem [90.17610617854247]
We introduce the Agentic Learning Ecosystem (ALE), a foundational infrastructure that optimize the production pipeline for agentic model.<n>ALE consists of three components: ROLL, a post-training framework for weight optimization; ROCK, a sandbox environment manager for trajectory generation; and iFlow CLI, an agent framework for efficient context engineering.<n>We release ROME, an open-source agent grounded by ALE and trained on over one million trajectories.
arXiv Detail & Related papers (2025-12-31T14:03:39Z) - Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization [37.17893162265247]
Youtu-Agent is a framework designed for the automated generation and continuous evolution of Large Language Model (LLM) agents.<n>Youtu-Agent features a structured configuration system that decouples execution environments, toolkits, and context management.<n> Experiments demonstrate that Youtu-Agent achieves state-of-the-art performance on WebWalkerQA (71.47%) and GAIA (72.8%) using open-weight models.
arXiv Detail & Related papers (2025-12-31T04:17:36Z) - AgentMath: Empowering Mathematical Reasoning for Large Language Models via Tool-Augmented Agent [80.83250816918861]
Large Reasoning Models (LRMs) like o3 and DeepSeek-R1 have achieved remarkable progress in natural language reasoning with long chain-of-thought.<n>However, they remain computationally inefficient and struggle with accuracy when solving problems requiring complex mathematical operations.<n>We present AgentMath, an agent framework that seamlessly integrates language models' reasoning capabilities with code interpreters' computational precision.
arXiv Detail & Related papers (2025-12-23T19:57:49Z) - Towards Efficient Agents: A Co-Design of Inference Architecture and System [66.59916327634639]
This paper presents AgentInfer, a unified framework for end-to-end agent acceleration.<n>We decompose the problem into four synergistic components: AgentCollab, AgentSched, AgentSAM, and AgentCompress.<n>Experiments on the BrowseComp-zh and DeepDiver benchmarks demonstrate that through the synergistic collaboration of these methods, AgentInfer reduces ineffective token consumption by over 50%.
arXiv Detail & Related papers (2025-12-20T12:06:13Z) - Hybrid Agentic AI and Multi-Agent Systems in Smart Manufacturing [0.0]
This paper presents a hybrid agentic AI and multi agent framework for a Prescriptive Maintenance use case.<n>The proposed framework adopts a layered architecture that consists of perception, preprocessing, analytics, and optimization layers.<n> Specialized agents autonomously handle schema discovery, intelligent feature analysis, model selection, and prescriptive optimization.<n>An initial proof of concept implementation is validated on two industrial manufacturing datasets.
arXiv Detail & Related papers (2025-11-23T03:06:23Z) - $Agent^2$: An Agent-Generates-Agent Framework for Reinforcement Learning Automation [5.325886106098561]
Reinforcement learning (RL) agent development traditionally requires substantial expertise and iterative effort.<n>This paper introduces Agent$2$, an LLM-driven agent-generates-agent framework for fully automated RL agent design.<n>Agent$2$ translates natural language task descriptions and environment code into executable RL solutions without human intervention.
arXiv Detail & Related papers (2025-09-16T02:14:39Z) - SI-Agent: An Agentic Framework for Feedback-Driven Generation and Tuning of Human-Readable System Instructions for Large Language Models [0.0]
System Instructions (SIs) are pivotal for guiding Large Language Models (LLMs)<n>Existing automated methods frequently generate non-human-readable "soft prompts," sacrificing interpretability.<n>This paper introduces SI-Agent, a novel agentic framework designed to automatically generate and iteratively refine human-readable SIs.
arXiv Detail & Related papers (2025-07-03T23:44:50Z) - CompileAgent: Automated Real-World Repo-Level Compilation with Tool-Integrated LLM-based Agent System [52.048087777953064]
We propose CompileAgent, an agent framework dedicated to repo-level compilation.<n>CompileAgent integrates five tools and a flow-based agent strategy, enabling interaction with software artifacts for compilation instruction search and error resolution.<n>We show that our method significantly improves the compilation success rate, ranging from 10% to 71%.
arXiv Detail & Related papers (2025-05-07T08:59:14Z) - OptimAI: Optimization from Natural Language Using LLM-Powered AI Agents [8.441638148384389]
We introduce OptimAI, a framework for solving Optimization problems described in natural language.<n>Our framework is built upon the following key roles: formulator, planner, coder and code critic.<n>Our approach attains 88.1% accuracy on the NLP4LP dataset and 82.3% on the Optibench dataset, reducing error rates by 58% and 52%, respectively, over prior best results.
arXiv Detail & Related papers (2025-04-23T17:45:05Z) - Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute [61.00662702026523]
We propose a unified Test-Time Compute scaling framework that leverages increased inference-time instead of larger models.<n>Our framework incorporates two complementary strategies: internal TTC and external TTC.<n>We demonstrate our textbf32B model achieves a 46% issue resolution rate, surpassing significantly larger models such as DeepSeek R1 671B and OpenAI o1.
arXiv Detail & Related papers (2025-03-31T07:31:32Z) - Gödel Agent: A Self-Referential Agent Framework for Recursive Self-Improvement [112.04307762405669]
G"odel Agent is a self-evolving framework inspired by the G"odel machine.<n>G"odel Agent can achieve continuous self-improvement, surpassing manually crafted agents in performance, efficiency, and generalizability.
arXiv Detail & Related papers (2024-10-06T10:49:40Z) - A Dynamic LLM-Powered Agent Network for Task-Oriented Agent Collaboration [55.35849138235116]
We propose automatically selecting a team of agents from candidates to collaborate in a dynamic communication structure toward different tasks and domains.
Specifically, we build a framework named Dynamic LLM-Powered Agent Network ($textDyLAN$) for LLM-powered agent collaboration.
We demonstrate that DyLAN outperforms strong baselines in code generation, decision-making, general reasoning, and arithmetic reasoning tasks with moderate computational cost.
arXiv Detail & Related papers (2023-10-03T16:05:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.