Related papers: A Systematic Study of LLM-Based Architectures for Automated Patching

A Systematic Study of LLM-Based Architectures for Automated Patching

URL: http://arxiv.org/abs/2603.01257v1
Date: Sun, 01 Mar 2026 20:26:22 GMT
Title: A Systematic Study of LLM-Based Architectures for Automated Patching
Authors: Qingxiao Xu, Ze Sheng, Zhicheng Chen, Jeff Huang,
Abstract summary: We present a controlled evaluation of four large language models (LLMs)-based patching paradigms.<n>We analyze patch correctness, failure modes, token usage, and execution time across real-world vulnerability tasks.<n>Our results reveal clear architectural trade-offs: fixed are efficient but brittle, single-agent systems balance flexibility and cost, and multi-agent designs improve at the expense of substantially higher overhead.
Score: 7.9821766277253845
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) have shown promise for automated patching, but their effectiveness depends strongly on how they are integrated into patching systems. While prior work explores prompting strategies and individual agent designs, the field lacks a systematic comparison of patching architectures. In this paper, we present a controlled evaluation of four LLM-based patching paradigms -- fixed workflow, single-agent system, multi-agent system, and general-purpose code agents -- using a unified benchmark and evaluation framework. We analyze patch correctness, failure modes, token usage, and execution time across real-world vulnerability tasks. Our results reveal clear architectural trade-offs: fixed workflows are efficient but brittle, single-agent systems balance flexibility and cost, and multi-agent designs improve generalization at the expense of substantially higher overhead and increased risk of reasoning drift on complex tasks. Surprisingly, general-purpose code agents achieve the strongest overall patching performance, benefiting from general-purpose tool interfaces that support effective adaptation across vulnerability types. Overall, we show that architectural design and iteration depth, rather than model capability alone, dominate the reliability and cost of LLM-based automated patching.

Related papers

Architecture-Aware Multi-Design Generation for Repository-Level Feature Addition [53.50448142467294]
RAIM is a multi-design and architecture-aware framework for repository-level feature addition.<n>It shifts away from linear patching by generating multiple diverse implementation designs.<n>Experiments on the NoCode-bench Verified dataset demonstrate that RAIM establishes a new state-of-the-art performance.
arXiv Detail & Related papers (2026-03-02T12:50:40Z)
NEMO: Execution-Aware Optimization Modeling via Autonomous Coding Agents [41.70615840873279]
We present NEMO, a system that translates Natural-language descriptions of decision problems into formal Executable Mathematical Optimization implementations.<n>NEMO centers on remote interaction with autonomous coding agents (ACAs), treated as a first-class abstraction analogous to API-based interaction with LLMs.<n>Because ACAs execute within sandboxed environments, code produced by NEMO is executable by construction, allowing automated validation and repair.
arXiv Detail & Related papers (2026-01-29T07:57:23Z)
Integrating Diverse Assignment Strategies into DETRs [61.61489761918158]
Label assignment is a critical component in object detectors, particularly within DETR-style frameworks.<n>We propose LoRA-DETR, a flexible and lightweight framework that seamlessly integrates diverse assignment strategies into any DETR-style detector.
arXiv Detail & Related papers (2026-01-14T07:28:54Z)
Multi-Agent Tool-Integrated Policy Optimization [67.12841355267678]
Large language models (LLMs) increasingly rely on multi-turn tool-integrated planning for knowledge-intensive and complex reasoning tasks.<n>Existing implementations typically rely on a single agent, but they suffer from limited context length and noisy tool responses.<n>No existing methods support effective reinforcement learning post-training of tool-integrated multi-agent frameworks.
arXiv Detail & Related papers (2025-10-06T10:44:04Z)
JoyAgent-JDGenie: Technical Report on the GAIA [27.025464023889853]
Large Language Models are increasingly deployed as autonomous agents for complex real-world tasks.<n>We propose a generalist agent architecture that integrates planning and execution agents with critic model voting, a hierarchical memory system spanning working, semantic, and procedural layers, and a refined tool suite for search, code execution, and multimodal parsing.
arXiv Detail & Related papers (2025-10-01T04:41:58Z)
MAS-ZERO: Designing Multi-Agent Systems with Zero Supervision [76.42361936804313]
We introduce MAS-ZERO, the first self-evolved, inference-time framework for automatic MAS design.<n> MAS-ZERO employs meta-level design to iteratively generate, evaluate, and refine MAS configurations tailored to each problem instance.
arXiv Detail & Related papers (2025-05-21T00:56:09Z)
AegisLLM: Scaling Agentic Systems for Self-Reflective Defense in LLM Security [74.22452069013289]
AegisLLM is a cooperative multi-agent defense against adversarial attacks and information leakage.<n>We show that scaling agentic reasoning system at test-time substantially enhances robustness without compromising model utility.<n> Comprehensive evaluations across key threat scenarios, including unlearning and jailbreaking, demonstrate the effectiveness of AegisLLM.
arXiv Detail & Related papers (2025-04-29T17:36:05Z)
IMPROVE: Iterative Model Pipeline Refinement and Optimization Leveraging LLM Experts [28.9807389592324]
Large language model (LLM) agents have emerged as a promising solution to automate the workflow of machine learning.<n>We introduce Iterative Refinement, a novel strategy for LLM-driven ML pipeline design inspired by how human ML experts iteratively refine models.<n>By systematically updating individual components based on real training feedback, Iterative Refinement improves overall model performance.
arXiv Detail & Related papers (2025-02-25T01:52:37Z)
Towards more Contextual Agents: An extractor-Generator Optimization Framework [0.0]
Large Language Model (LLM)-based agents have demonstrated remarkable success in solving complex tasks across a wide range of general-purpose applications.<n>However, their performance often degrades in context-specific scenarios, such as specialized industries or research domains.<n>To address this challenge, our work introduces a systematic approach to enhance the contextual adaptability of LLM-based agents.
arXiv Detail & Related papers (2025-02-18T15:07:06Z)
An Empirical Study on LLM-based Agents for Automated Bug Fixing [8.660251517380779]
Large language models (LLMs) and LLM-based Agents have been applied to fix bugs automatically.<n>We examine six repair systems on the SWE-bench Verified benchmark for automated bug fixing.
arXiv Detail & Related papers (2024-11-15T14:19:15Z)
Multi-Agent Reinforcement Learning for Microprocessor Design Space Exploration [71.95914457415624]
Microprocessor architects are increasingly resorting to domain-specific customization in the quest for high-performance and energy-efficiency. We propose an alternative formulation that leverages Multi-Agent RL (MARL) to tackle this problem. Our evaluation shows that the MARL formulation consistently outperforms single-agent RL baselines.
arXiv Detail & Related papers (2022-11-29T17:10:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.