Confucius Code Agent: Scalable Agent Scaffolding for Real-World Codebases
- URL: http://arxiv.org/abs/2512.10398v4
- Date: Wed, 17 Dec 2025 04:22:36 GMT
- Title: Confucius Code Agent: Scalable Agent Scaffolding for Real-World Codebases
- Authors: Zhaodong Wang, Zhenting Qi, Sherman Wong, Nathan Hu, Samuel Lin, Jun Ge, Erwin Gao, Wenlin Chen, Yilun Du, Minlan Yu, Ying Zhang,
- Abstract summary: We introduce the Confucius Code Agent (CCA), a scalable software engineering agent that can operate at large-scales.<n>CCA is built on top of the Confucius SDK, an agent development platform structured around three complementary perspectives.<n>In addition, we introduce a meta-agent that automates the synthesis, evaluation, and refinement of agent configurations.
- Score: 44.366968508477235
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Real-world software engineering tasks require coding agents that can operate over massive repositories, sustain long-horizon sessions, and reliably coordinate complex toolchains at test time. Existing research-grade agents offer transparency but struggle when scaled to real-world workloads, while proprietary systems achieve strong practical performance but provide limited extensibility, interpretability, and controllability. We introduce the Confucius Code Agent (CCA), a scalable software engineering agent that can operate at large-scale codebases. CCA is built on top of the Confucius SDK, an agent development platform structured around three complementary perspectives: Agent Experience (AX), User Experience (UX), and Developer Experience (DX). The SDK integrates a unified orchestrator with hierarchical working memory for long-context reasoning, a persistent note-taking system for cross-session continual learning, and a modular extension system for reliable tool use. In addition, we introduce a meta-agent that automates the synthesis, evaluation, and refinement of agent configurations through a build-test-improve loop, enabling rapid adaptation to new tasks, environments, and tool stacks. Instantiated with these mechanisms, CCA demonstrates strong performance on real-world software engineering tasks. On SWE-Bench-Pro, CCA reaches a Resolve@1 of 54.3%, exceeding prior research baselines and comparing favorably to commercial results, under identical repositories, model backend, and tool access. Together, the Confucius SDK and CCA form a general, extensible, and production-grade foundation for building effective and robust coding agents, bridging the gap between research prototypes and practical large-scale deployment.
Related papers
- OpenSage: Self-programming Agent Generation Engine [56.399761469404496]
We propose OpenSage, the first agent development kit (ADK) to automatically create agents with self-generated topology and toolsets.<n>OpenSage offers effective functionality for agents to create and manage their own sub-agents and toolkits.<n>We believe OpenSage can pave the way for the next generation of agent development, shifting the focus from human-centered to AI-centered paradigms.
arXiv Detail & Related papers (2026-02-18T21:16:29Z) - A Lightweight Modular Framework for Constructing Autonomous Agents Driven by Large Language Models: Design, Implementation, and Applications in AgentForge [1.932555230783329]
Lightweight, open-source Python framework designed to democratize the construction of LLM-driven autonomous agents.<n>AgentForge introduces three key innovations: (1) a composable skill abstraction that enables fine-grained task decomposition with formally defined input-output contracts, (2) a unified backend interface supporting seamless switching between cloud-based APIs and local inference engines, and (3) a declarative YAML-based configuration system that separates agent logic from implementation details.
arXiv Detail & Related papers (2026-01-19T20:33:26Z) - ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development [72.4729759618632]
We introduce ABC-Bench, a benchmark to evaluate agentic backend coding within a realistic, executable workflow.<n>We curated 224 practical tasks spanning 8 languages and 19 frameworks from open-source repositories.<n>Our evaluation reveals that even state-of-the-art models struggle to deliver reliable performance on these holistic tasks.
arXiv Detail & Related papers (2026-01-16T08:23:52Z) - LoCoBench-Agent: An Interactive Benchmark for LLM Agents in Long-Context Software Engineering [90.84806758077536]
We introduce textbfLoCoBench-Agent, a comprehensive evaluation framework specifically designed to assess large language models (LLMs) agents in realistic, long-context software engineering.<n>Our framework extends LoCoBench's 8,000 scenarios into interactive agent environments, enabling systematic evaluation of multi-turn conversations.<n>Our framework provides agents with 8 specialized tools (file operations, search, code analysis) and evaluates them across context lengths ranging from 10K to 1M tokens.
arXiv Detail & Related papers (2025-11-17T23:57:24Z) - The OpenHands Software Agent SDK: A Composable and Extensible Foundation for Production Agents [46.254487394746725]
We present the OpenHands Software Agent SDK, a toolkit for implementing software development agents.<n>To achieve flexibility, we design a simple interface for implementing agents that requires only a few lines of code in the default case.<n>For security and reliability, it delivers seamless local-to-remote execution portability, integrated REST/WebSocket services.
arXiv Detail & Related papers (2025-11-05T18:16:44Z) - Open Agent Specification (Agent Spec): A Unified Representation for AI Agents [10.685555728094338]
We introduce Open Agent Specification (Agent Spec), a declarative language that defines AI agents and agentic.<n>Agent Spec defines a common set of components, control and data flow semantics, and schemas that allow an agent to be defined once and executed across different runtimes.
arXiv Detail & Related papers (2025-10-05T12:26:42Z) - AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications [95.42093979627703]
AgentScope supports flexible and efficient tool-based agent-environment interactions.<n>We ground agent behaviors in the ReAct paradigm and offer advanced agent-level infrastructure.<n>AgentScope also includes robust engineering support for developer-friendly experiences.
arXiv Detail & Related papers (2025-08-22T10:35:56Z) - Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training [67.895981259683]
General AI Agents are increasingly recognized as foundational frameworks for the next generation of artificial intelligence.<n>Current agent systems are either closed-source or heavily reliant on a variety of paid APIs and proprietary tools.<n>We present Cognitive Kernel-Pro, a fully open-source and (to the maximum extent) free multi-module agent framework.
arXiv Detail & Related papers (2025-08-01T08:11:31Z) - AgentMesh: A Cooperative Multi-Agent Generative AI Framework for Software Development Automation [0.0]
We propose a Python-based framework that uses multiple cooperating LLM-powered agents to automate software development tasks.<n>In AgentMesh, specialized agents - a Planner, Coder, Debugger, and Reviewer - work in concert to transform a high-level requirement into fully realized code.
arXiv Detail & Related papers (2025-07-26T10:10:02Z) - AgentFly: Extensible and Scalable Reinforcement Learning for LM Agents [25.735754822676277]
Language model (LM) agents have gained significant attention for their ability to autonomously complete tasks.<n> reinforcement learning (RL) has been explored to enhance LM's capabilities, such as reasoning and factuality.<n>We built AgentFly, a scalable and Agent-RL framework designed to empower LM agents with a variety of RL algorithms.
arXiv Detail & Related papers (2025-07-20T10:22:36Z) - State and Memory is All You Need for Robust and Reliable AI Agents [29.259008600842517]
Large language models (LLMs) have enabled powerful advances in natural language understanding and generation.<n>Yet their application to complex, real-world scientific remain limited by challenges in memory, planning, and tool integration.<n>Here, we introduce SciBORG, a modular agentic framework that allows LLM-based agents to autonomously plan, reason, and achieve robust and reliable domain-specific task execution.
arXiv Detail & Related papers (2025-06-30T02:02:35Z) - Unified Software Engineering agent as AI Software Engineer [14.733475669942276]
Large Language Model (LLM) technology has raised expectations for automated coding.<n>In this paper, we seek to understand this question by developing a Unified Software Engineering agent or USEagent.<n>We envision USEagent as the first draft of a future AI Software Engineer which can be a team member in future software development teams involving both AI and humans.
arXiv Detail & Related papers (2025-06-17T16:19:13Z) - AgentScope: A Flexible yet Robust Multi-Agent Platform [66.64116117163755]
AgentScope is a developer-centric multi-agent platform with message exchange as its core communication mechanism.
The abundant syntactic tools, built-in agents and service functions, user-friendly interfaces for application demonstration and utility monitor, zero-code programming workstation, and automatic prompt tuning mechanism significantly lower the barriers to both development and deployment.
arXiv Detail & Related papers (2024-02-21T04:11:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.