Incremental Analysis of Legacy Applications Using Knowledge Graphs for Application Modernization
- URL: http://arxiv.org/abs/2505.06885v1
- Date: Sun, 11 May 2025 07:33:31 GMT
- Title: Incremental Analysis of Legacy Applications Using Knowledge Graphs for Application Modernization
- Authors: Saravanan Krishnan, Amith Singhee, Keerthi Narayan Raghunath, Alex Mathai, Atul Kumar, David Wenk,
- Abstract summary: o6en have large so6ware systems that are several decades old.<n>Many of these systems are written in old programming languages such as Assembler, PL/1, Assembler, etc.
- Score: 2.479446117912957
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Industries such as banking, telecom, and airlines - o6en have large so6ware systems that are several decades old. Many of these systems are written in old programming languages such as COBOL, PL/1, Assembler, etc. In many cases, the documentation is not updated, and those who developed/designed these systems are no longer around. Understanding these systems for either modernization or even regular maintenance has been a challenge. An extensive application may have natural boundaries based on its code dependencies and architecture. There are also other logical boundaries in an enterprise setting driven by business functions, data domains, etc. Due to these complications, the system architects generally plan their modernization across these logical boundaries in parts, thereby adopting an incremental approach for the modernization journey of the entire system. In this work, we present a so6ware system analysis tool that allows a subject ma=er expert (SME) or system architect to analyze a large so6ware system incrementally. We analyze the source code and other artifacts (such as data schema) to create a knowledge graph using a customizable ontology/schema. Entities and relations in our ontology can be defined for any combination of programming languages and platforms. Using this knowledge graph, the analyst can then define logical boundaries around dependent Entities (e.g. Programs, Transactions, Database Tables etc.). Our tool then presents different views showcasing the dependencies from the newly defined boundary to/from the other logical groups of the system. This exercise is repeated interactively to 1) Identify the Entities and groupings of interest for a modernization task and 2) Understand how a change in one part of the system may affect the other parts. To validate the efficacy of our tool, we provide an initial study of our system on two client applications.
Related papers
- Architecture-Aware Multi-Design Generation for Repository-Level Feature Addition [53.50448142467294]
RAIM is a multi-design and architecture-aware framework for repository-level feature addition.<n>It shifts away from linear patching by generating multiple diverse implementation designs.<n>Experiments on the NoCode-bench Verified dataset demonstrate that RAIM establishes a new state-of-the-art performance.
arXiv Detail & Related papers (2026-03-02T12:50:40Z) - GenAI for Systems: Recurring Challenges and Design Principles from Software to Silicon [62.2138479061386]
Generative AI is reshaping how computing systems are designed, optimized, and built, yet research remains fragmented across software, architecture, and chip design communities.<n>This paper takes a cross-stack perspective, examining how generative models are being applied from code generation and distributed runtimes through hardware design space exploration to RTL synthesis, physical layout, and verification.
arXiv Detail & Related papers (2026-02-16T22:45:33Z) - LogicLens: Leveraging Semantic Code Graph to explore Multi Repository large systems [0.2519906683279152]
We introduce LogicLens, a reactive conversational agent that assists developers in exploring complex software systems.<n>We present the architecture of the system, discuss emergent behaviors, and evaluate its effectiveness on real-world multi-repository scenarios.
arXiv Detail & Related papers (2026-01-15T15:35:23Z) - Model management to support systems engineering workflows using ontology-based knowledge graphs [0.09134244356393663]
We propose a framework to manage modelling artefacts generated from executions workflow.<n>Basic workflow concepts, related formalisms and artefacts are formally defined in an ontology specified in OML.<n>We also developed several tools to support system engineering during the design of, their enactment, and artefact storage.<n>Results show that our proposal not only helped the system engineer with fundamental difficulties like storage and versioning but also reduced the time needed to access relevant information.
arXiv Detail & Related papers (2025-12-10T12:45:16Z) - A Comprehensive Survey on Benchmarks and Solutions in Software Engineering of LLM-Empowered Agentic System [56.40989626804489]
This survey provides the first holistic analysis of Large Language Models-powered software engineering.<n>We review over 150 recent papers and propose a taxonomy along two key dimensions: (1) Solutions, categorized into prompt-based, fine-tuning-based, and agent-based paradigms, and (2) Benchmarks, including tasks such as code generation, translation, and repair.
arXiv Detail & Related papers (2025-10-10T06:56:50Z) - A Survey on (M)LLM-Based GUI Agents [62.57899977018417]
Graphical User Interface (GUI) Agents have emerged as a transformative paradigm in human-computer interaction.<n>Recent advances in large language models and multimodal learning have revolutionized GUI automation across desktop, mobile, and web platforms.<n>This survey identifies key technical challenges, including accurate element localization, effective knowledge retrieval, long-horizon planning, and safety-aware execution control.
arXiv Detail & Related papers (2025-03-27T17:58:31Z) - Learning Representations for Reasoning: Generalizing Across Diverse Structures [5.031093893882575]
We aim to push the boundary of reasoning models by devising algorithms that generalize across knowledge and query structures.
Our library treats structured data as first-class citizens and removes the barrier for developing algorithms on structured data.
arXiv Detail & Related papers (2024-10-16T20:23:37Z) - RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph [63.87660059104077]
We present RepoGraph, a plug-in module that manages a repository-level structure for modern AI software engineering solutions.<n>RepoGraph substantially boosts the performance of all systems, leading to a new state-of-the-art among open-source frameworks.
arXiv Detail & Related papers (2024-10-03T05:45:26Z) - Code-Survey: An LLM-Driven Methodology for Analyzing Large-Scale Codebases [3.8153349016958074]
We introduce Code-Survey, the first LLM-driven methodology designed to explore and analyze large-scales.
By carefully designing surveys, Code-Survey transforms unstructured data, such as commits, emails, into organized, structured, and analyzable datasets.
This enables quantitative analysis of complex software evolution and uncovers valuable insights related to design, implementation, maintenance, reliability, and security.
arXiv Detail & Related papers (2024-09-24T17:08:29Z) - XMainframe: A Large Language Model for Mainframe Modernization [5.217282407759193]
Mainframe operating systems continue to support critical sectors like finance and government.
These systems are often viewed as outdated, requiring extensive maintenance and modernization.
We introduce XMainframe, a state-of-the-art large language model (LLM) specifically designed with knowledge of legacy systems and mainframes.
arXiv Detail & Related papers (2024-08-05T20:01:10Z) - A Symbolic Computing Perspective on Software Systems [0.0]
Symbolic mathematical computing systems have served as a canary in the coal mine of software systems for more than sixty years.
All of the major symbolic mathematical computing systems include low-level code for arithmetic, memory management and other primitives, a compiler or interpreter for a bespoke programming language, a library of high level mathematical algorithms, and some form of user interface.
arXiv Detail & Related papers (2024-06-13T13:10:47Z) - Serving Deep Learning Model in Relational Databases [70.53282490832189]
Serving deep learning (DL) models on relational data has become a critical requirement across diverse commercial and scientific domains.
We highlight three pivotal paradigms: The state-of-the-art DL-centric architecture offloads DL computations to dedicated DL frameworks.
The potential UDF-centric architecture encapsulates one or more tensor computations into User Defined Functions (UDFs) within the relational database management system (RDBMS)
arXiv Detail & Related papers (2023-10-07T06:01:35Z) - Enhancing Architecture Frameworks by Including Modern Stakeholders and their Views/Viewpoints [48.87872564630711]
The stakeholders with data science and Machine Learning related concerns, such as data scientists and data engineers, are yet to be included in existing architecture frameworks.<n>We surveyed 61 subject matter experts from over 25 organizations in 10 countries.
arXiv Detail & Related papers (2023-08-09T21:54:34Z) - A Graphical Modeling Language for Artificial Intelligence Applications
in Automation Systems [69.50862982117127]
An interdisciplinary graphical modeling language that enables the modeling of an AI application as an overall system comprehensible to all disciplines does not yet exist.
This paper presents a graphical modeling language that enables consistent and understandable modeling of AI applications in automation systems at system level.
arXiv Detail & Related papers (2023-06-20T12:06:41Z) - iWarded: A System for Benchmarking Datalog+/- Reasoning (technical
report) [0.0]
iWarded is a system that can generate very large, complex, realistic reasoning settings.
We present the iWarded system and a set of novel theoretical results adopted to generate effective scenarios.
arXiv Detail & Related papers (2021-03-15T17:56:46Z) - CoreDiag: Eliminating Redundancy in Constraint Sets [68.8204255655161]
We present a new algorithm which can be exploited for the determination of minimal cores (minimal non-redundant constraint sets)
The algorithm is especially useful for distributed knowledge engineering scenarios where the degree of redundancy can become high.
In order to show the applicability of our approach, we present an empirical study conducted with commercial configuration knowledge bases.
arXiv Detail & Related papers (2021-02-24T09:16:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.