Temporal Analysis and Repair of Flaky Dockerfiles
- URL: http://arxiv.org/abs/2408.05379v1
- Date: Fri, 9 Aug 2024 23:17:56 GMT
- Title: Temporal Analysis and Repair of Flaky Dockerfiles
- Authors: Taha Shabani, Noor Nashid, Parsa Alian, Ali Mesbah,
- Abstract summary: Dockerfile flakiness is characterized by inconsistent build behavior without Dockerfile or project source code changes.
We present a comprehensive taxonomy of common flakiness categories, including dependency-related errors and server connectivity issues.
We introduce FlakiDock, a tool leveraging large language models and retrieval-augmented generation techniques to automatically repair flaky Dockerfiles.
- Score: 6.518508607788089
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Dockerfile flakiness, characterized by inconsistent build behavior without Dockerfile or project source code changes, poses significant challenges in Continuous Integration and Delivery (CI/CD) pipelines. This issue can lead to unreliable deployments and increased debugging efforts, yet it remains underexplored in current research. We conduct a systematic analysis of Dockerfile flakiness, presenting a comprehensive taxonomy of common flakiness categories, including dependency-related errors and server connectivity issues. Furthermore, we introduce FlakiDock, a tool leveraging large language models and retrieval-augmented generation techniques with dynamic analysis and an iterative feedback loop to automatically repair flaky Dockerfiles. Our evaluation shows that FlakiDock achieves a 73.55% repair accuracy, outperforming existing tools such as PARFUM by 12,581% and GPT-4-based prompting by 94.63%. These results underscore the effectiveness of FlakiDock in addressing Dockerfile flakiness and improving build reliability.
Related papers
- Systemic Flakiness: An Empirical Analysis of Co-Occurring Flaky Test Failures [6.824747267214373]
Flaky tests produce inconsistent outcomes without code changes.
Developers spend 1.28% of their time repairing flaky tests at a monthly cost of $2,250.
We show that flaky tests often exist in clusters, with co-occurring failures that share the same root causes, which we call systemic flakiness.
arXiv Detail & Related papers (2025-04-23T14:51:23Z) - Beyond Progress Measures: Theoretical Insights into the Mechanism of Grokking [50.465604300990904]
Grokking refers to the abrupt improvement in test accuracy after extended overfitting.
In this work, we investigate the grokking mechanism underlying the Transformer in the task of prime number operations.
arXiv Detail & Related papers (2025-04-04T04:42:38Z) - Doctor: Optimizing Container Rebuild Efficiency by Instruction Re-Orchestration [11.027705516378875]
We present Doctor, a method for improving Dockerfile build efficiency through instruction re-ordering.
We developed a dependency taxonomy based on Dockerfile syntax and a historical modification analysis to prioritize frequently modified instructions.
Experiments show Doctor improves 92.75% of Dockerfiles, reducing rebuild time by an average of 26.5%, with 12.82% of files achieving over a 50% reduction.
arXiv Detail & Related papers (2025-04-02T13:53:35Z) - Refactoring for Dockerfile Quality: A Dive into Developer Practices and Automation Potential [0.0]
This paper explores the utility and practicality of automating Dockerfile using 600files from 358 open-source projects.
Our approach leads to an average reduction of 32% in image size and a 6% decrease in build duration, with improvements in understandability and maintainability observed in 77% and 91% of cases.
arXiv Detail & Related papers (2025-01-23T23:10:47Z) - SINDER: Repairing the Singular Defects of DINOv2 [61.98878352956125]
Vision Transformer models trained on large-scale datasets often exhibit artifacts in the patch token they extract.
We propose a novel fine-tuning smooth regularization that rectifies structural deficiencies using only a small dataset.
arXiv Detail & Related papers (2024-07-23T20:34:23Z) - FineWAVE: Fine-Grained Warning Verification of Bugs for Automated Static Analysis Tools [18.927121513404924]
Automated Static Analysis Tools (ASATs) have evolved over time to assist in detecting bugs.
Previous research efforts have explored learning-based methods to validate the reported warnings.
We propose FineWAVE, a learning-based approach that verifies bug-sensitive warnings at a fine-grained granularity.
arXiv Detail & Related papers (2024-03-24T06:21:35Z) - Empirical Analysis of Vulnerabilities Life Cycle in Golang Ecosystem [0.773844059806915]
A comprehensive investigation was undertaken to examine the life cycle of vulnerability in Golang.
It turned out that 66.10% of modules in the Golang ecosystem were affected by vulnerabilities.
By analyzing reasons behind non-lagged and lagged vulnerabilities, timely releasing and indexing patch versions could significantly enhance ecosystem security.
arXiv Detail & Related papers (2023-12-31T14:53:51Z) - CONVERT:Contrastive Graph Clustering with Reliable Augmentation [110.46658439733106]
We propose a novel CONtrastiVe Graph ClustEring network with Reliable AugmenTation (CONVERT)
In our method, the data augmentations are processed by the proposed reversible perturb-recover network.
To further guarantee the reliability of semantics, a novel semantic loss is presented to constrain the network.
arXiv Detail & Related papers (2023-08-17T13:07:09Z) - On the Security Blind Spots of Software Composition Analysis [46.1389163921338]
We present a novel approach to detect vulnerable clones in the Maven repository.
We retrieve over 53k potential vulnerable clones from Maven Central.
We detect 727 confirmed vulnerable clones and synthesize a testable proof-of-vulnerability project for each of those.
arXiv Detail & Related papers (2023-06-08T20:14:46Z) - Multi-Granularity Detector for Vulnerability Fixes [13.653249890867222]
We propose MiDas (Multi-Granularity Detector for Vulnerability Fixes) to identify vulnerability-fixing commits.
MiDas constructs different neural networks for each level of code change granularity, corresponding to commit-level, file-level, hunk-level, and line-level.
MiDas outperforms the current state-of-the-art baseline in terms of AUC by 4.9% and 13.7% on Java and Python-based datasets.
arXiv Detail & Related papers (2023-05-23T10:06:28Z) - Enhancing Multiple Reliability Measures via Nuisance-extended
Information Bottleneck [77.37409441129995]
In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition.
We consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training.
We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training.
arXiv Detail & Related papers (2023-03-24T16:03:21Z) - DRIVE: Dockerfile Rule Mining and Violation Detection [6.510749313511299]
A Dockerfile defines a set of instructions to build Docker images, which can then be instantiated to support containerized applications.
Recent studies have revealed a considerable amount of quality issues with Dockerfiles.
We propose a novel approach to mine implicit rules and detect potential violations of such rules in Dockerfiles.
arXiv Detail & Related papers (2022-12-12T01:15:30Z) - FedFA: Federated Learning with Feature Anchors to Align Features and
Classifiers for Heterogeneous Data [8.677832361022809]
Federated learning allows multiple clients to collaboratively train a model without exchanging their data.
Common solutions involve an auxiliary loss to regularize weight divergence or feature inconsistency during local training.
We propose a novel framework named Federated learning with Feature Anchors (FedFA)
arXiv Detail & Related papers (2022-11-17T02:27:44Z) - Break-It-Fix-It: Unsupervised Learning for Program Repair [90.55497679266442]
We propose a new training approach, Break-It-Fix-It (BIFI), which has two key ideas.
We use the critic to check a fixer's output on real bad inputs and add good (fixed) outputs to the training data.
Based on these ideas, we iteratively update the breaker and the fixer while using them in conjunction to generate more paired data.
BIFI outperforms existing methods, obtaining 90.5% repair accuracy on GitHub-Python and 71.7% on DeepFix.
arXiv Detail & Related papers (2021-06-11T20:31:04Z) - D2A: A Dataset Built for AI-Based Vulnerability Detection Methods Using
Differential Analysis [55.15995704119158]
We propose D2A, a differential analysis based approach to label issues reported by static analysis tools.
We use D2A to generate a large labeled dataset to train models for vulnerability identification.
arXiv Detail & Related papers (2021-02-16T07:46:53Z) - RobustBench: a standardized adversarial robustness benchmark [84.50044645539305]
Key challenge in benchmarking robustness is that its evaluation is often error-prone leading to robustness overestimation.
We evaluate adversarial robustness with AutoAttack, an ensemble of white- and black-box attacks.
We analyze the impact of robustness on the performance on distribution shifts, calibration, out-of-distribution detection, fairness, privacy leakage, smoothness, and transferability.
arXiv Detail & Related papers (2020-10-19T17:06:18Z) - Learning perturbation sets for robust machine learning [97.6757418136662]
We use a conditional generator that defines the perturbation set over a constrained region of the latent space.
We measure the quality of our learned perturbation sets both quantitatively and qualitatively.
We leverage our learned perturbation sets to train models which are empirically and certifiably robust to adversarial image corruptions and adversarial lighting variations.
arXiv Detail & Related papers (2020-07-16T16:39:54Z) - Evaluating Prediction-Time Batch Normalization for Robustness under
Covariate Shift [81.74795324629712]
We call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift.
We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness.
The method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
arXiv Detail & Related papers (2020-06-19T05:08:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.