Related papers: Rethinking Software Misconfigurations in the Real World: An Empirical Study and Literature Analysis

Rethinking Software Misconfigurations in the Real World: An Empirical Study and Literature Analysis

URL: http://arxiv.org/abs/2412.11121v1
Date: Sun, 15 Dec 2024 08:53:41 GMT
Title: Rethinking Software Misconfigurations in the Real World: An Empirical Study and Literature Analysis
Authors: Yuhao Liu, Yingnan Zhou, Hanfeng Zhang, Zhiwei Chang, Sihan Xu, Yan Jia, Wei Wang, Zheli Liu,
Abstract summary: We conduct an empirical study on 823 real-world misconfiguration issues, based on which we propose a novel classification of the root causes of software misconfigurations.<n>We find that the research targets have changed from fundamental software to advanced applications.<n>In the meanwhile, the research on non-crash misconfigurations such as performance degradation and security risks also has a significant growth.
Score: 9.88064494257381
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Software misconfiguration has consistently been a major reason for software failures. Over the past twenty decades, much work has been done to detect and diagnose software misconfigurations. However, there is still a gap between real-world misconfigurations and the literature. It is desirable to investigate whether existing taxonomy and tools are applicable for real-world misconfigurations in modern software. In this paper, we conduct an empirical study on 823 real-world misconfiguration issues, based on which we propose a novel classification of the root causes of software misconfigurations, i.e., constraint violation, resource unavailability, component-dependency error, and misunderstanding of configuration effects. Then, we systematically review the literature on misconfiguration troubleshooting, and study the trends of research and the practicality of the tools and datasets in this field. We find that the research targets have changed from fundamental software to advanced applications (e.g., cloud service). In the meanwhile, the research on non-crash misconfigurations such as performance degradation and security risks also has a significant growth. Despite the progress, a majority of studies lack reproducibility due to the unavailable tools and evaluation datasets. In total, only six tools and two datasets are publicly available. However, the adaptability of these tools limit their practical use on real-world misconfigurations. We also summarize the important challenges and several suggestions to facilitate the research on software misconfiguration.

Related papers

Identity resolution of software metadata using Large Language Models [0.0]
This article presents an evaluation of instruction-tuned large language models for the task of software metadata identity resolution.<n>We benchmarked multiple models against a human-annotated gold standard, examined their behavior on ambiguous cases, and introduced an agreement-based proxy for high-confidence automated decisions.
arXiv Detail & Related papers (2025-05-29T14:47:31Z)
Does Machine Unlearning Truly Remove Model Knowledge? A Framework for Auditing Unlearning in LLMs [58.24692529185971]
We introduce a comprehensive auditing framework for unlearning evaluation comprising three benchmark datasets, six unlearning algorithms, and five prompt-based auditing methods.<n>We evaluate the effectiveness and robustness of different unlearning strategies.
arXiv Detail & Related papers (2025-05-29T09:19:07Z)
From Bugs to Benchmarks: A Comprehensive Survey of Software Defect Datasets [19.140541190998842]
Software defect datasets are collections of software bugs and their associated information. Over the years, numerous software defect datasets have been developed, providing rich resources for the community. This article provides a comprehensive survey of 132 software defect datasets.
arXiv Detail & Related papers (2025-04-24T23:07:04Z)
Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute [61.00662702026523]
We propose a unified Test-Time Compute scaling framework that leverages increased inference-time instead of larger models. Our framework incorporates two complementary strategies: internal TTC and external TTC. We demonstrate our textbf32B model achieves a 46% issue resolution rate, surpassing significantly larger models such as DeepSeek R1 671B and OpenAI o1.
arXiv Detail & Related papers (2025-03-31T07:31:32Z)
Adaptive Tool Use in Large Language Models with Meta-Cognition Trigger [49.81945268343162]
We propose MeCo, an adaptive decision-making strategy for external tool use. MeCo captures high-level cognitive signals in the representation space, guiding when to invoke tools. Our experiments show that MeCo accurately detects LLMs' internal cognitive signals and significantly improves tool-use decision-making.
arXiv Detail & Related papers (2025-02-18T15:45:01Z)
Does the Tool Matter? Exploring Some Causes of Threats to Validity in Mining Software Repositories [9.539825294372786]
We use two tools to extract and analyse ten large software projects. Despite similar trends, even simple metrics such as the numbers of commits and developers may differ by up to 500%. We find that such substantial differences are often caused by minor technical details.
arXiv Detail & Related papers (2025-01-25T07:42:56Z)
A Comprehensive Library for Benchmarking Multi-class Visual Anomaly Detection [52.228708947607636]
This paper introduces a comprehensive visual anomaly detection benchmark, ADer, which is a modular framework for new methods. The benchmark includes multiple datasets from industrial and medical domains, implementing fifteen state-of-the-art methods and nine comprehensive metrics. We objectively reveal the strengths and weaknesses of different methods and provide insights into the challenges and future directions of multi-class visual anomaly detection.
arXiv Detail & Related papers (2024-06-05T13:40:07Z)
Efficacy of static analysis tools for software defect detection on open-source projects [0.0]
The study used popular analysis tools such as SonarQube, PMD, Checkstyle, and FindBugs to perform the comparison. The study results show that SonarQube performs considerably well than all other tools in terms of its defect detection.
arXiv Detail & Related papers (2024-05-20T19:05:32Z)
Prompting Large Language Models to Tackle the Full Software Development Lifecycle: A Case Study [72.24266814625685]
We explore the performance of large language models (LLMs) across the entire software development lifecycle with DevEval.<n>DevEval features four programming languages, multiple domains, high-quality data collection, and carefully designed and verified metrics for each task.<n> Empirical studies show that current LLMs, including GPT-4, fail to solve the challenges presented within DevEval.
arXiv Detail & Related papers (2024-03-13T15:13:44Z)
Comparison of Static Analysis Architecture Recovery Tools for Microservice Applications [43.358953895199264]
We will identify static analysis architecture recovery tools for microservice applications via a multi-vocal literature review. We will then execute them on a common dataset and compare the measured effectiveness in architecture recovery.
arXiv Detail & Related papers (2024-03-11T17:26:51Z)
Deep Configuration Performance Learning: A Systematic Survey and Taxonomy [3.077531983369872]
We conduct a comprehensive review on the topic of deep learning for performance learning of software, covering 1,206 searched papers spanning six indexing services. Our results outline key statistics, taxonomy, strengths, weaknesses, and optimal usage scenarios for techniques related to the preparation of configuration data. We also identify the good practices and potentially problematic phenomena from the studies surveyed, together with a comprehensive summary of actionable suggestions and insights into future opportunities within the field.
arXiv Detail & Related papers (2024-03-05T21:05:16Z)
Investigating Reproducibility in Deep Learning-Based Software Fault Prediction [16.25827159504845]
With the rapid adoption of increasingly complex machine learning models, it becomes more and more difficult for scholars to reproduce the results that are reported in the literature. This is in particular the case when the applied deep learning models and the evaluation methodology are not properly documented and when code and data are not shared. We have conducted a systematic review of the current literature and examined the level of 56 research articles that were published between 2019 and 2022 in top-tier software engineering conferences.
arXiv Detail & Related papers (2024-02-08T13:00:18Z)
A Metadata-Based Ecosystem to Improve the FAIRness of Research Software [0.3185506103768896]
The reuse of research software is central to research efficiency and academic exchange. The DataDesc ecosystem is presented, an approach to describing data models of software interfaces with detailed and machine-actionable metadata.
arXiv Detail & Related papers (2023-06-18T19:01:08Z)
Applying Machine Learning Analysis for Software Quality Test [0.0]
It is critical to comprehend what triggers maintenance and if it may be predicted. Numerous methods of assessing the complexity of created programs may produce useful prediction models. In this paper, the machine learning is applied on the available data to calculate the cumulative software failure levels.
arXiv Detail & Related papers (2023-05-16T06:10:54Z)
GLUECons: A Generic Benchmark for Learning Under Constraints [102.78051169725455]
In this work, we create a benchmark that is a collection of nine tasks in the domains of natural language processing and computer vision. We model external knowledge as constraints, specify the sources of the constraints for each task, and implement various models that use these constraints.
arXiv Detail & Related papers (2023-02-16T16:45:36Z)
Kubric: A scalable dataset generator [73.78485189435729]
Kubric is a Python framework that interfaces with PyBullet and Blender to generate photo-realistic scenes, with rich annotations, and seamlessly scales to large jobs distributed over thousands of machines. We demonstrate the effectiveness of Kubric by presenting a series of 13 different generated datasets for tasks ranging from studying 3D NeRF models to optical flow estimation.
arXiv Detail & Related papers (2022-03-07T18:13:59Z)
VELVET: a noVel Ensemble Learning approach to automatically locate VulnErable sTatements [62.93814803258067]
This paper presents VELVET, a novel ensemble learning approach to locate vulnerable statements in source code. Our model combines graph-based and sequence-based neural networks to successfully capture the local and global context of a program graph. VELVET achieves 99.6% and 43.6% top-1 accuracy over synthetic data and real-world data, respectively.
arXiv Detail & Related papers (2021-12-20T22:45:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.