A Process Mining-Based System For The Analysis and Prediction of Software Development Workflows
- URL: http://arxiv.org/abs/2510.25935v2
- Date: Fri, 31 Oct 2025 17:31:15 GMT
- Title: A Process Mining-Based System For The Analysis and Prediction of Software Development Workflows
- Authors: Antía Dorado, Iván Folgueira, Sofía Martín, Gonzalo Martín, Álvaro Porto, Alejandro Ramos, John Wallace,
- Abstract summary: CodeSight is an end-to-end system designed to anticipate deadline compliance in software development.<n>It captures development and deployment data directly from GitHub, transforming it into process mining logs for detailed analysis.<n>CodeSight employs an LSTM model that predicts remaining PR resolution times based on sequential activity traces and static features.
- Score: 33.72751145910978
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: CodeSight is an end-to-end system designed to anticipate deadline compliance in software development workflows. It captures development and deployment data directly from GitHub, transforming it into process mining logs for detailed analysis. From these logs, the system generates metrics and dashboards that provide actionable insights into PR activity patterns and workflow efficiency. Building on this structured representation, CodeSight employs an LSTM model that predicts remaining PR resolution times based on sequential activity traces and static features, enabling early identification of potential deadline breaches. In tests, the system demonstrates high precision and F1 scores in predicting deadline compliance, illustrating the value of integrating process mining with machine learning for proactive software project management.
Related papers
- Tokenomics: Quantifying Where Tokens Are Used in Agentic Software Engineering [4.812321790984494]
We conduct an analysis of token consumption patterns in an LLM-MA system within the Software Development Life Cycle (SDLC)<n>We analyze execution traces from 30 software development tasks performed by the ChatDev framework using a GPT-5 reasoning model.<n>Our preliminary findings show that the iterative Code Review stage accounts for the majority of token consumption for an average of 59.4% of tokens.
arXiv Detail & Related papers (2026-01-20T20:52:14Z) - Agentic Predictor: Performance Prediction for Agentic Workflows via Multi-View Encoding [56.565200973244146]
Agentic Predictor is a lightweight predictor for efficient agentic workflow evaluation.<n>By learning to approximate task success rates, Agentic Predictor enables fast and accurate selection of optimal agentic workflow configurations.
arXiv Detail & Related papers (2025-05-26T09:46:50Z) - NLP-Based .NET CLR Event Logs Analyzer [0.0]
We present a tool for analyzing.NET CLR event logs based on a novel method inspired by Natural Language Processing (NLP) approach.<n>We utilize a BERT-based architecture with an enhanced tokenization process customized to event logs.<n>Our experiments demonstrate the efficacy of our approach in compressing event sequences, detecting recurring patterns, and identifying anomalies.
arXiv Detail & Related papers (2025-02-06T17:01:38Z) - Benchmarking Agentic Workflow Generation [80.74757493266057]
We introduce WorfBench, a unified workflow generation benchmark with multi-faceted scenarios and intricate graph workflow structures.<n>We also present WorfEval, a systemic evaluation protocol utilizing subsequence and subgraph matching algorithms.<n>We observe that the generated can enhance downstream tasks, enabling them to achieve superior performance with less time during inference.
arXiv Detail & Related papers (2024-10-10T12:41:19Z) - Event Abstraction for Enterprise Collaboration Systems to Support Social
Process Mining [0.0]
One aim of Process Mining is the discovery of process models from event logs of information systems.
ECS logs come with special characteristics that have so far not been fully addressed by existing event abstraction approaches.
We aim to close this gap with a tailored ECS event abstraction approach that trains a model by comparing recorded actual user activities with the system-generated low-level traces.
arXiv Detail & Related papers (2023-08-08T17:00:30Z) - Nemo: Guiding and Contextualizing Weak Supervision for Interactive Data
Programming [77.38174112525168]
We present Nemo, an end-to-end interactive Supervision system that improves overall productivity of WS learning pipeline by an average 20% (and up to 47% in one task) compared to the prevailing WS supervision approach.
arXiv Detail & Related papers (2022-03-02T19:57:32Z) - Exploring the potential of flow-based programming for machine learning
deployment in comparison with service-oriented architectures [8.677012233188968]
We argue that part of the reason is infrastructure that was not designed for activities around data collection and analysis.
We propose to consider flow-based programming with data streams as an alternative to commonly used service-oriented architectures for building software applications.
arXiv Detail & Related papers (2021-08-09T15:06:02Z) - CoCoMoT: Conformance Checking of Multi-Perspective Processes via SMT
(Extended Version) [62.96267257163426]
We introduce the CoCoMoT (Computing Conformance Modulo Theories) framework.
First, we show how SAT-based encodings studied in the pure control-flow setting can be lifted to our data-aware case.
Second, we introduce a novel preprocessing technique based on a notion of property-preserving clustering.
arXiv Detail & Related papers (2021-03-18T20:22:50Z) - Predictive Process Model Monitoring using Recurrent Neural Networks [2.4029798593292706]
This paper introduces Processes-As-Movies (PAM), a technique that provides a middle ground between predictive monitoring.
It does so by capturing declarative process constraints between activities in various windows of a process execution trace.
Various recurrent neural network topologies tailored to high-dimensional input are used to model the process model evolution with windows as time steps.
arXiv Detail & Related papers (2020-11-05T13:57:33Z) - Self-Supervised Log Parsing [59.04636530383049]
Large-scale software systems generate massive volumes of semi-structured log records.
Existing approaches rely on log-specifics or manual rule extraction.
We propose NuLog that utilizes a self-supervised learning model and formulates the parsing task as masked language modeling.
arXiv Detail & Related papers (2020-03-17T19:25:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.