Related papers: Machine Learning Pipeline for Software Engineering: A Systematic Literature Review

Machine Learning Pipeline for Software Engineering: A Systematic Literature Review

URL: http://arxiv.org/abs/2508.00045v1
Date: Thu, 31 Jul 2025 15:37:30 GMT
Title: Machine Learning Pipeline for Software Engineering: A Systematic Literature Review
Authors: Samah Kansab,
Abstract summary: This systematic literature review examines state-of-the-art Machine Learning pipelines designed for software engineering (SE)<n>Our findings show that robust preprocessing, such as SMOTE for data balancing, improves model reliability.<n> Ensemble methods like Random Forest and Gradient Boosting dominate performance across tasks.<n>New metrics like Best Arithmetic Mean (BAM) are emerging in niche applications.
Score: 0.0
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: The rapid advancement of software development practices has introduced challenges in ensuring quality and efficiency across the software engineering (SE) lifecycle. As SE systems grow in complexity, traditional approaches often fail to scale, resulting in longer debugging times, inefficient defect detection, and resource-heavy development cycles. Machine Learning (ML) has emerged as a key solution, enabling automation in tasks such as defect prediction, code review, and release quality estimation. However, the effectiveness of ML in SE depends on the robustness of its pipeline, including data collection, preprocessing, feature engineering, algorithm selection, validation, and evaluation. This systematic literature review (SLR) examines state-of-the-art ML pipelines designed for SE, consolidating best practices, challenges, and gaps. Our findings show that robust preprocessing, such as SMOTE for data balancing and SZZ-based algorithms for feature selection, improves model reliability. Ensemble methods like Random Forest and Gradient Boosting dominate performance across tasks, while simpler models such as Naive Bayes remain valuable for efficiency and interpretability. Evaluation metrics including AUC, F1-score, and precision are most common, with new metrics like Best Arithmetic Mean (BAM) emerging in niche applications. Validation techniques such as bootstrapping are widely used to ensure model stability and generalizability. This SLR highlights the importance of well-designed ML pipelines for addressing SE challenges and provides actionable insights for researchers and practitioners seeking to optimize software quality and efficiency. By identifying gaps and trends, this study sets a foundation for advancing ML adoption and fostering innovation in increasingly complex development environments.

Related papers

Synthetic Code Surgery: Repairing Bugs and Vulnerabilities with LLMs and Synthetic Data [0.0]
This paper presents a novel methodology for enhancing Automated Program Repair (APR) through synthetic data generation utilizing Large Language Models (LLMs)<n>The proposed approach addresses this limitation through a two-phase process: a synthetic sample generation followed by a rigorous quality assessment.<n> Experimental evaluation on the VulRepair test set dataset showed statistically significant improvements in Perfect Prediction rates.
arXiv Detail & Related papers (2025-05-12T09:14:20Z)
A Systematic Literature Review of Parameter-Efficient Fine-Tuning for Large Code Models [2.171120568435925]
Large Language Models (LLMs) for code require significant computational resources for training and fine-tuning.<n>To address this, the research community has increasingly turned to Efficient Fine-Tuning (PEFT)<n>PEFT enables the adaptation of large models by updating only a small subset of parameters, rather than the entire model.<n>Our study synthesizes findings from 27 peer-reviewed papers, identifying patterns in configuration strategies and adaptation trade-offs.
arXiv Detail & Related papers (2025-04-29T16:19:25Z)
Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute [61.00662702026523]
We propose a unified Test-Time Compute scaling framework that leverages increased inference-time instead of larger models.<n>Our framework incorporates two complementary strategies: internal TTC and external TTC.<n>We demonstrate our textbf32B model achieves a 46% issue resolution rate, surpassing significantly larger models such as DeepSeek R1 671B and OpenAI o1.
arXiv Detail & Related papers (2025-03-31T07:31:32Z)
IMPROVE: Iterative Model Pipeline Refinement and Optimization Leveraging LLM Experts [40.98057887166546]
Large language model (LLM) agents have emerged as a promising solution to automate the workflow of machine learning.<n>We introduce Iterative Refinement, a novel strategy for LLM-driven ML pipeline design inspired by how human ML experts iteratively refine models.<n>By systematically updating individual components based on real training feedback, Iterative Refinement improves overall model performance.
arXiv Detail & Related papers (2025-02-25T01:52:37Z)
Evaluation of Artificial Intelligence Methods for Lead Time Prediction in Non-Cycled Areas of Automotive Production [1.3499500088995464]
The present study examines the effectiveness of applying Artificial Intelligence methods in an automotive production environment.<n>Data structures are analyzed to identify contextual features and then preprocessed using one-hot encoding.<n>The research demonstrates that AI methods can be effectively applied to highly variable production data, adding business value.
arXiv Detail & Related papers (2025-01-13T13:28:03Z)
Lingma SWE-GPT: An Open Development-Process-Centric Language Model for Automated Software Improvement [62.94719119451089]
Lingma SWE-GPT series learns from and simulating real-world code submission activities. Lingma SWE-GPT 72B resolves 30.20% of GitHub issues, marking a significant improvement in automatic issue resolution.
arXiv Detail & Related papers (2024-11-01T14:27:16Z)
Uncertainty Aware Learning for Language Model Alignment [97.36361196793929]
We propose uncertainty-aware learning (UAL) to improve the model alignment of different task scenarios. We implement UAL in a simple fashion -- adaptively setting the label smoothing value of training according to the uncertainty of individual samples. Experiments on widely used benchmarks demonstrate that our UAL significantly and consistently outperforms standard supervised fine-tuning.
arXiv Detail & Related papers (2024-06-07T11:37:45Z)
Machine Learning Insides OptVerse AI Solver: Design Principles and Applications [74.67495900436728]
We present a comprehensive study on the integration of machine learning (ML) techniques into Huawei Cloud's OptVerse AI solver. We showcase our methods for generating complex SAT and MILP instances utilizing generative models that mirror multifaceted structures of real-world problem. We detail the incorporation of state-of-the-art parameter tuning algorithms which markedly elevate solver performance.
arXiv Detail & Related papers (2024-01-11T15:02:15Z)
PerfRL: A Small Language Model Framework for Efficient Code Optimization [14.18092813639534]
In this paper, we introduce PerfRL, an innovative framework designed to tackle the problem of code optimization.<n>Our framework leverages the capabilities of small language models (SLMs) and reinforcement learning (RL)<n>Our approach achieves similar or better results compared to state-of-the-art models using shorter training times and smaller pre-trained models.
arXiv Detail & Related papers (2023-12-09T19:50:23Z)
Using Machine Learning To Identify Software Weaknesses From Software Requirement Specifications [49.1574468325115]
This research focuses on finding an efficient machine learning algorithm to identify software weaknesses from requirement specifications. Keywords extracted using latent semantic analysis help map the CWE categories to PROMISE_exp. Naive Bayes, support vector machine (SVM), decision trees, neural network, and convolutional neural network (CNN) algorithms were tested.
arXiv Detail & Related papers (2023-08-10T13:19:10Z)
End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes [52.818579746354665]
This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures. We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data.
arXiv Detail & Related papers (2023-05-25T10:58:46Z)
Automatic Feasibility Study via Data Quality Analysis for ML: A Case-Study on Label Noise [21.491392581672198]
We present Snoopy, with the goal of supporting data scientists and machine learning engineers performing a systematic and theoretically founded feasibility study. We approach this problem by estimating the irreducible error of the underlying task, also known as the Bayes error rate (BER) We demonstrate in end-to-end experiments how users are able to save substantial labeling time and monetary efforts.
arXiv Detail & Related papers (2020-10-16T14:21:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.