Build Optimization: A Systematic Literature Review
- URL: http://arxiv.org/abs/2501.11940v1
- Date: Tue, 21 Jan 2025 07:32:06 GMT
- Title: Build Optimization: A Systematic Literature Review
- Authors: Henri Aïdasso, Mohammed Sayagh, Francis Bordeleau,
- Abstract summary: Continuous Integration (CI) consists of an automated build process involving continuous compilation, testing, and packaging of the software system.<n>To better understand the literature so as to help practitioners find solutions for their problems and guide future research, we conduct a systematic review of 97 studies on build optimization published between 2006 and 2024.<n>The identified build optimization studies focus on two main challenges: (1) long build durations, and (2) build failures.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Continuous Integration (CI) consists of an automated build process involving continuous compilation, testing, and packaging of the software system. While CI comes up with several advantages related to quality and time to delivery, CI also presents several challenges addressed by a large body of research. To better understand the literature so as to help practitioners find solutions for their problems and guide future research, we conduct a systematic review of 97 studies on build optimization published between 2006 and 2024, which we summarized according to their goals, methodologies, used datasets, and leveraged metrics. The identified build optimization studies focus on two main challenges: (1) long build durations, and (2) build failures. To meet the first challenge, existing studies have developed a range of techniques, including predicting build outcome and duration, selective build execution, and build acceleration using caching or repairing performance smells. The causes of build failures have been the subject of several studies, leading to the development of techniques for predicting build script maintenance and automating repair. Recent studies have also focused on predicting flaky build failures caused by environmental issues. The majority of these techniques use machine learning algorithms and leverage build metrics, which we classify into five categories. Additionally, we identify eight publicly available build datasets for build optimization research.
Related papers
- Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute [61.00662702026523]
We propose a unified Test-Time Compute scaling framework that leverages increased inference-time instead of larger models.
Our framework incorporates two complementary strategies: internal TTC and external TTC.
We demonstrate our textbf32B model achieves a 46% issue resolution rate, surpassing significantly larger models such as DeepSeek R1 671B and OpenAI o1.
arXiv Detail & Related papers (2025-03-31T07:31:32Z) - Towards Build Optimization Using Digital Twins [2.8402080392117757]
This paper proposes a novel idea of developing Digital Twins of build processes to enable global and continuous improvement.
This framework offers digital shadowing functionalities, including real-time build data acquisition and continuous monitoring of build process performance metrics.
arXiv Detail & Related papers (2025-03-25T06:16:52Z) - An Exploratory Study on Build Issue Resolution Among Computer Science Students [11.795902462023756]
Computer Science (CS) students often encounter the common challenge of OSS failing to build on their local machines.
Despite the prevalence of build issues faced by CS students, there is a lack of studies exploring this topic.
Phase I characterized the build issues students faced, their resolution attempts, and the effectiveness of those attempts.
Phase II introduced an intervention method that emphasized key information (e.g., recommended programming language versions) to students.
arXiv Detail & Related papers (2025-02-21T20:02:45Z) - Language Models for Code Optimization: Survey, Challenges and Future Directions [7.928856221466083]
Language models (LMs) built upon deep neural networks (DNNs) have recently demonstrated breakthrough effectiveness in software engineering tasks.<n>This study aims to provide actionable insights and references for both researchers and practitioners in this rapidly evolving field.
arXiv Detail & Related papers (2025-01-02T14:20:36Z) - Comparison of Static Analysis Architecture Recovery Tools for Microservice Applications [41.962720602828085]
This paper presents the results of a multivocal literature review with the goal of identifying architecture recovery tools for microservice applications.<n>The best-performing tool exhibited an overall F1-score of 0.86.<n>The possibility of combining multiple tools to increase the recovery correctness was investigated, yielding a combination of four individual tools that achieves an F1-score of 0.91.
arXiv Detail & Related papers (2024-12-11T12:46:16Z) - Chaos Engineering: A Multi-Vocal Literature Review [1.6199400106794553]
Chaos Engineering addresses challenges by proactively testing how systems in production behave under turbulent conditions.<n>We performed a Multivocal Literature Review (MLR) on chaos engineering to fill this research gap.<n>We first used the selected sources to derive a unified definition of chaos engineering and to identify key capabilities, components, and adoption drivers.
arXiv Detail & Related papers (2024-12-02T11:57:24Z) - Specifications: The missing link to making the development of LLM systems an engineering discipline [65.10077876035417]
We discuss the progress the field has made so far-through advances like structured outputs, process supervision, and test-time compute.
We outline several future directions for research to enable the development of modular and reliable LLM-based systems.
arXiv Detail & Related papers (2024-11-25T07:48:31Z) - A Computational Method for Measuring "Open Codes" in Qualitative Analysis [47.358809793796624]
Open coding is an inductive qualitative process that identifies and interprets "open codes" from datasets.
We present a computational method to measure and identify potential biases from "open codes" systematically.
arXiv Detail & Related papers (2024-11-19T00:44:56Z) - What's Wrong with Your Code Generated by Large Language Models? An Extensive Study [80.18342600996601]
Large language models (LLMs) produce code that is shorter yet more complicated as compared to canonical solutions.
We develop a taxonomy of bugs for incorrect codes that includes three categories and 12 sub-categories, and analyze the root cause for common bug types.
We propose a novel training-free iterative method that introduces self-critique, enabling LLMs to critique and correct their generated code based on bug types and compiler feedback.
arXiv Detail & Related papers (2024-07-08T17:27:17Z) - Characterization of Large Language Model Development in the Datacenter [55.9909258342639]
Large Language Models (LLMs) have presented impressive performance across several transformative tasks.
However, it is non-trivial to efficiently utilize large-scale cluster resources to develop LLMs.
We present an in-depth characterization study of a six-month LLM development workload trace collected from our GPU datacenter Acme.
arXiv Detail & Related papers (2024-03-12T13:31:14Z) - Multiobjective Optimization Analysis for Finding Infrastructure-as-Code
Deployment Configurations [0.3774866290142281]
This paper is focused on a multiobjective problem related to Infrastructure-as-Code deployment configurations.
We resort in this paper to nine different evolutionary-based multiobjective algorithms.
Results obtained by each method after 10 independent runs have been compared using Friedman's non-parametric tests.
arXiv Detail & Related papers (2024-01-18T13:55:32Z) - Automatic Feature Engineering for Time Series Classification: Evaluation
and Discussion [0.0]
Time Series Classification (TSC) is a crucial and challenging problem in data science and knowledge engineering.
Several tools for extracting unsupervised informative summary statistics, aka features, from time series have been designed in the recent years.
In this article, we propose a simple TSC process to evaluate the potential predictive performance of the feature sets obtained with existing feature engineering tools.
arXiv Detail & Related papers (2023-08-02T10:46:42Z) - CREATOR: Tool Creation for Disentangling Abstract and Concrete Reasoning of Large Language Models [74.22729793816451]
Large Language Models (LLMs) have made significant progress in utilizing tools, but their ability is limited by API availability.
We propose CREATOR, a novel framework that enables LLMs to create their own tools using documentation and code realization.
We evaluate CREATOR on MATH and TabMWP benchmarks, respectively consisting of challenging math competition problems.
arXiv Detail & Related papers (2023-05-23T17:51:52Z) - Learning to Optimize: A Primer and A Benchmark [94.29436694770953]
Learning to optimize (L2O) is an emerging approach that leverages machine learning to develop optimization methods.
This article is poised to be the first comprehensive survey and benchmark of L2O for continuous optimization.
arXiv Detail & Related papers (2021-03-23T20:46:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.