Generating Software Architecture Description from Source Code using Reverse Engineering and Large Language Model
- URL: http://arxiv.org/abs/2511.05165v1
- Date: Fri, 07 Nov 2025 11:35:46 GMT
- Title: Generating Software Architecture Description from Source Code using Reverse Engineering and Large Language Model
- Authors: Ahmad Hatahet, Christoph Knieke, Andreas Rausch,
- Abstract summary: Software Architecture Descriptions (SADs) are essential for managing the inherent complexity of modern software systems.<n>SADs are often missing, outdated, or poorly aligned with the system's actual implementation.<n>We propose a semi-automated generation of SADs from source code by integrating reverse engineering (RE) techniques with a Large Language Model (LLM)
- Score: 2.6126272668390373
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Software Architecture Descriptions (SADs) are essential for managing the inherent complexity of modern software systems. They enable high-level architectural reasoning, guide design decisions, and facilitate effective communication among diverse stakeholders. However, in practice, SADs are often missing, outdated, or poorly aligned with the system's actual implementation. Consequently, developers are compelled to derive architectural insights directly from source code-a time-intensive process that increases cognitive load, slows new developer onboarding, and contributes to the gradual degradation of clarity over the system's lifetime. To address these issues, we propose a semi-automated generation of SADs from source code by integrating reverse engineering (RE) techniques with a Large Language Model (LLM). Our approach recovers both static and behavioral architectural views by extracting a comprehensive component diagram, filtering architecturally significant elements (core components) via prompt engineering, and generating state machine diagrams to model component behavior based on underlying code logic with few-shots prompting. This resulting views representation offer a scalable and maintainable alternative to traditional manual architectural documentation. This methodology, demonstrated using C++ examples, highlights the potent capability of LLMs to: 1) abstract the component diagram, thereby reducing the reliance on human expert involvement, and 2) accurately represent complex software behaviors, especially when enriched with domain-specific knowledge through few-shot prompting. These findings suggest a viable path toward significantly reducing manual effort while enhancing system understanding and long-term maintainability.
Related papers
- Architecture-Aware Multi-Design Generation for Repository-Level Feature Addition [53.50448142467294]
RAIM is a multi-design and architecture-aware framework for repository-level feature addition.<n>It shifts away from linear patching by generating multiple diverse implementation designs.<n>Experiments on the NoCode-bench Verified dataset demonstrate that RAIM establishes a new state-of-the-art performance.
arXiv Detail & Related papers (2026-03-02T12:50:40Z) - VSA:Visual-Structural Alignment for UI-to-Code [29.15071743239679]
We propose bfVSA (VSA), a multi-stage paradigm designed to synthesize organized assets through visual-text alignment.<n>Our framework yields a substantial improvement in code modularity and architectural consistency over state-of-the-art benchmarks.
arXiv Detail & Related papers (2025-12-23T03:55:45Z) - Model management to support systems engineering workflows using ontology-based knowledge graphs [0.09134244356393663]
We propose a framework to manage modelling artefacts generated from executions workflow.<n>Basic workflow concepts, related formalisms and artefacts are formally defined in an ontology specified in OML.<n>We also developed several tools to support system engineering during the design of, their enactment, and artefact storage.<n>Results show that our proposal not only helped the system engineer with fundamental difficulties like storage and versioning but also reduced the time needed to access relevant information.
arXiv Detail & Related papers (2025-12-10T12:45:16Z) - The Role of Advanced Computer Architectures in Accelerating Artificial Intelligence Workloads [0.0]
The remarkable progress in Artificial Intelligence (AI) is foundation-ally linked to a concurrent revolution in computer architecture.<n>As AI models, particularly Deep Neural Networks (DNNs), have grown in complexity, their massive computational demands have pushed traditional architectures to their limits.<n>This paper provides a structured review of this co-evolution, analyzing the architectural landscape designed to accelerate modern AI workloads.
arXiv Detail & Related papers (2025-11-13T06:26:39Z) - Building Specialized Software-Assistant ChatBot with Graph-Based Retrieval-Augmented Generation [0.815557531820863]
We introduce a Graph-based Retrieval-Augmented Generation framework that automatically converts enterprise web applications into state-action knowledge graphs.<n>The framework was co-developed with the AI enterprise RAKAM, in collaboration with Lemon Learning.
arXiv Detail & Related papers (2025-11-07T14:56:45Z) - Executable Knowledge Graphs for Replicating AI Research [65.41207324831583]
Executable Knowledge Graphs (xKG) is a modular and pluggable knowledge base that automatically integrates technical insights, code snippets, and domain-specific knowledge extracted from scientific literature.<n>Code will released at https://github.com/zjunlp/xKG.
arXiv Detail & Related papers (2025-10-20T17:53:23Z) - Data Dependency-Aware Code Generation from Enhanced UML Sequence Diagrams [54.528185120850274]
We propose a novel step-by-step code generation framework named API2Dep.<n>First, we introduce an enhanced Unified Modeling Language (UML) API diagram tailored for service-oriented architectures.<n>Second, recognizing the critical role of data flow, we introduce a dedicated data dependency inference task.
arXiv Detail & Related papers (2025-08-05T12:28:23Z) - Large-Scale Model Enabled Semantic Communication Based on Robust Knowledge Distillation [45.347078403677216]
Large-scale models (LSMs) can be an effective framework for semantic representation and understanding.<n>However, their direct deployment is often hindered by high computational complexity and resource requirements.<n>This paper proposes a novel knowledge distillation based semantic communication framework.
arXiv Detail & Related papers (2025-08-04T07:47:18Z) - Factored Agents: Decoupling In-Context Learning and Memorization for Robust Tool Use [4.437184840125514]
We propose a novel factored agent architecture designed to overcome the limitations of traditional single-agent systems in agentic AI.<n>Our approach decomposes the agent into two specialized components: (1) a large language model that serves as a high level planner and in-context learner, and (2) a smaller language model which acts as a memorizer of tool format and output.<n> Empirical evaluations demonstrate that our factored architecture significantly improves planning accuracy and error resilience, while elucidating the inherent trade-off between in-context learning and static memorization.
arXiv Detail & Related papers (2025-03-29T01:27:11Z) - Establishing tool support for a concept DSL [0.0]
This thesis describes Conceptual, a DSL for modeling the behavior of software systems using self-contained and highly reusable units of concepts.<n>The suggested strategy is then implemented with a simple compiler, allowing developers to access and utilize Alloy's existing analysis tools for program reasoning.
arXiv Detail & Related papers (2025-03-07T09:18:31Z) - Specifications: The missing link to making the development of LLM systems an engineering discipline [65.10077876035417]
We discuss the progress the field has made so far-through advances like structured outputs, process supervision, and test-time compute.<n>We outline several future directions for research to enable the development of modular and reliable LLM-based systems.
arXiv Detail & Related papers (2024-11-25T07:48:31Z) - Towards Living Software Architecture Diagrams [0.0]
We propose a tool that generates architectural diagrams for a software system by analyzing its software artifacts and unifying them into a comprehensive system representation.
This representation can be manually modified while ensuring that changes are reintegrated into the diagram when it is regenerated.
arXiv Detail & Related papers (2024-07-25T12:31:52Z) - Inference Optimization of Foundation Models on AI Accelerators [68.24450520773688]
Powerful foundation models, including large language models (LLMs), with Transformer architectures have ushered in a new era of Generative AI.
As the number of model parameters reaches to hundreds of billions, their deployment incurs prohibitive inference costs and high latency in real-world scenarios.
This tutorial offers a comprehensive discussion on complementary inference optimization techniques using AI accelerators.
arXiv Detail & Related papers (2024-07-12T09:24:34Z) - SOLO: A Single Transformer for Scalable Vision-Language Modeling [74.05173379908703]
We present SOLO, a single transformer for visiOn-Language mOdeling.<n>A unified single Transformer architecture, like SOLO, effectively addresses these scalability concerns in LVLMs.<n>In this paper, we introduce the first open-source training recipe for developing SOLO, an open-source 7B LVLM.
arXiv Detail & Related papers (2024-07-08T22:40:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.