Related papers: ArchAgent: Scalable Legacy Software Architecture Recovery with LLMs

ArchAgent: Scalable Legacy Software Architecture Recovery with LLMs

URL: http://arxiv.org/abs/2601.13007v1
Date: Mon, 19 Jan 2026 12:39:05 GMT
Title: ArchAgent: Scalable Legacy Software Architecture Recovery with LLMs
Authors: Rusheng Pan, Bingcheng Mao, Tianyi Ma, Zhenhua Ling,
Abstract summary: ArchAgent is a scalable agent-based framework that combines static analysis, adaptive code segmentation, and LLM-powered synthesis.<n>It reconstructs multiview, business-aligned architectures from cross-repositorys.<n>ArchAgent introduces scalable diagram generation with contextual pruning and integrates cross-repository data to identify business-critical modules.
Score: 44.137226823695066
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recovering accurate architecture from large-scale legacy software is hindered by architectural drift, missing relations, and the limited context of Large Language Models (LLMs). We present ArchAgent, a scalable agent-based framework that combines static analysis, adaptive code segmentation, and LLM-powered synthesis to reconstruct multiview, business-aligned architectures from cross-repository codebases. ArchAgent introduces scalable diagram generation with contextual pruning and integrates cross-repository data to identify business-critical modules. Evaluations of typical large-scale GitHub projects show significant improvements over existing benchmarks. An ablation study confirms that dependency context improves the accuracy of generated architectures of production-level repositories, and a real-world case study demonstrates effective recovery of critical business logics from legacy projects. The dataset is available at https://github.com/panrusheng/arch-eval-benchmark.

Related papers

Architecture-Aware Multi-Design Generation for Repository-Level Feature Addition [53.50448142467294]
RAIM is a multi-design and architecture-aware framework for repository-level feature addition.<n>It shifts away from linear patching by generating multiple diverse implementation designs.<n>Experiments on the NoCode-bench Verified dataset demonstrate that RAIM establishes a new state-of-the-art performance.
arXiv Detail & Related papers (2026-03-02T12:50:40Z)
ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development [72.4729759618632]
We introduce ABC-Bench, a benchmark to evaluate agentic backend coding within a realistic, executable workflow.<n>We curated 224 practical tasks spanning 8 languages and 19 frameworks from open-source repositories.<n>Our evaluation reveals that even state-of-the-art models struggle to deliver reliable performance on these holistic tasks.
arXiv Detail & Related papers (2026-01-16T08:23:52Z)
Towards Realistic Project-Level Code Generation via Multi-Agent Collaboration and Semantic Architecture Modeling [7.753074942497876]
We introduce CodeProjectEval, a project-level code generation dataset built from 18 real-world repositories with 12.7 files and 2,388.6 lines of code per task.<n>We propose ProjectGen, a multi-agent framework that decomposes projects into architecture design, skeleton generation, and code filling stages.<n>Experiments show that ProjectGen achieves state-of-the-art performance, passing 52/124 test cases on the small-scale project-level code generation dataset DevBench.
arXiv Detail & Related papers (2025-11-05T12:12:35Z)
LM-Searcher: Cross-domain Neural Architecture Search with LLMs via Unified Numerical Encoding [55.5535016040221]
LM-Searcher is a novel framework for cross-domain neural architecture optimization.<n>Central to our approach is NCode, a universal numerical string representation for neural architectures.<n>Our dataset, encompassing a wide range of architecture-performance pairs, encourages robust and transferable learning.
arXiv Detail & Related papers (2025-09-06T09:26:39Z)
SaraCoder: Orchestrating Semantic and Structural Cues for Resource-Optimized Repository-Level Code Completion [34.41683042851225]
We propose a resource-optimized retrieval augmentation method, SaraCoder.<n>It maximizes information diversity and representativeness in a limited context window.<n>Our work proves that systematically refining retrieval results across multiple dimensions provides a new paradigm for building more accurate and resource-optimized repository-level code completion systems.
arXiv Detail & Related papers (2025-08-13T11:56:05Z)
Large Language Models are Good Relational Learners [55.40941576497973]
We introduce Rel-LLM, a novel architecture that utilizes a graph neural network (GNN)- based encoder to generate structured relational prompts for large language models (LLMs)<n>Unlike traditional text-based serialization approaches, our method preserves the inherent relational structure of databases while enabling LLMs to process and reason over complex entity relationships.
arXiv Detail & Related papers (2025-06-06T04:07:55Z)
Class-Level Code Generation from Natural Language Using Iterative, Tool-Enhanced Reasoning over Repository [4.767858874370881]
We introduce RepoClassBench, a benchmark designed to rigorously evaluate LLMs in generating class-level code within real-world repositories. RepoClassBench includes "Natural Language to Class generation" tasks across Java, Python & C# from a selection of repositories. We introduce Retrieve-Repotools-Reflect (RRR), a novel approach that equips LLMs with static analysis tools to iteratively navigate & reason about repository-level context.
arXiv Detail & Related papers (2024-04-22T03:52:54Z)
Serving Deep Learning Model in Relational Databases [70.53282490832189]
Serving deep learning (DL) models on relational data has become a critical requirement across diverse commercial and scientific domains. We highlight three pivotal paradigms: The state-of-the-art DL-centric architecture offloads DL computations to dedicated DL frameworks. The potential UDF-centric architecture encapsulates one or more tensor computations into User Defined Functions (UDFs) within the relational database management system (RDBMS)
arXiv Detail & Related papers (2023-10-07T06:01:35Z)
Operation Embeddings for Neural Architecture Search [15.033712726016255]
We propose the replacement of fixed operator encoding with learnable representations in the optimization process. Our method produces top-performing architectures that share similar operation and graph patterns.
arXiv Detail & Related papers (2021-05-11T09:17:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.