DRIVE: Dockerfile Rule Mining and Violation Detection
- URL: http://arxiv.org/abs/2212.05648v3
- Date: Tue, 25 Jul 2023 08:11:40 GMT
- Title: DRIVE: Dockerfile Rule Mining and Violation Detection
- Authors: Yu Zhou, Weilin Zhan, Zi Li, Tingting Han, Taolue Chen, Harald Gall
- Abstract summary: A Dockerfile defines a set of instructions to build Docker images, which can then be instantiated to support containerized applications.
Recent studies have revealed a considerable amount of quality issues with Dockerfiles.
We propose a novel approach to mine implicit rules and detect potential violations of such rules in Dockerfiles.
- Score: 6.510749313511299
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A Dockerfile defines a set of instructions to build Docker images, which can
then be instantiated to support containerized applications. Recent studies have
revealed a considerable amount of quality issues with Dockerfiles. In this
paper, we propose a novel approach DRIVE (Dockerfiles Rule mIning and Violation
dEtection) to mine implicit rules and detect potential violations of such rules
in Dockerfiles. DRIVE firstly parses Dockerfiles and transforms them to an
intermediate representation. It then leverages an efficient sequential pattern
mining algorithm to extract potential patterns. With heuristic-based reduction
and moderate human intervention, potential rules are identified, which can then
be utilized to detect potential violations of Dockerfiles. DRIVE identifies 34
semantic rules and 19 syntactic rules including 9 new semantic rules which have
not been reported elsewhere. Extensive experiments on real-world Dockerfiles
demonstrate the efficacy of our approach.
Related papers
- Toward Automated Test Generation for Dockerfiles Based on Analysis of Docker Image Layers [1.1879716317856948]
The process for building a Docker image is defined in a text file called a Dockerfile.
A Dockerfile can be considered as a kind of source code that contains instructions on how to build a Docker image.
We propose an automated test generation method for Dockerfiles based on processing results rather than processing steps.
arXiv Detail & Related papers (2025-04-25T08:02:46Z) - Doctor: Optimizing Container Rebuild Efficiency by Instruction Re-Orchestration [11.027705516378875]
We present Doctor, a method for improving Dockerfile build efficiency through instruction re-ordering.
We developed a dependency taxonomy based on Dockerfile syntax and a historical modification analysis to prioritize frequently modified instructions.
Experiments show Doctor improves 92.75% of Dockerfiles, reducing rebuild time by an average of 26.5%, with 12.82% of files achieving over a 50% reduction.
arXiv Detail & Related papers (2025-04-02T13:53:35Z) - Design and Implementation of Flutter based Multi-platform Docker Controller App [1.1443262816483672]
This paper focuses on developing a Flutter application for controlling Docker resources remotely.
The application uses the SSH protocol to establish a secure connection with the server and execute the commands.
An alternative approach is also explored, which involves connecting the application with the Docker engine using HTTP.
arXiv Detail & Related papers (2025-02-17T11:48:02Z) - Refactoring for Dockerfile Quality: A Dive into Developer Practices and Automation Potential [0.0]
This paper explores the utility and practicality of automating Dockerfile using 600files from 358 open-source projects.
Our approach leads to an average reduction of 32% in image size and a 6% decrease in build duration, with improvements in understandability and maintainability observed in 77% and 91% of cases.
arXiv Detail & Related papers (2025-01-23T23:10:47Z) - EagleEye: Attention to Unveil Malicious Event Sequences from Provenance Graphs [1.3359586871482305]
Securing endpoints is challenging due to the evolving nature of threats and attacks.
With endpoint logging systems becoming mature, provenance-graph representations enable the creation of sophisticated behavior rules.
We develop and present EagleEye, a novel system that uses rich features from provenance graphs for behavior event representation.
arXiv Detail & Related papers (2024-08-17T14:48:02Z) - Temporal Analysis and Repair of Flaky Dockerfiles [6.518508607788089]
Dockerfile flakiness is characterized by inconsistent build behavior without Dockerfile or project source code changes.
We present a comprehensive taxonomy of common flakiness categories, including dependency-related errors and server connectivity issues.
We introduce FlakiDock, a tool leveraging large language models and retrieval-augmented generation techniques to automatically repair flaky Dockerfiles.
arXiv Detail & Related papers (2024-08-09T23:17:56Z) - GeneralAD: Anomaly Detection Across Domains by Attending to Distorted Features [68.14842693208465]
GeneralAD is an anomaly detection framework designed to operate in semantic, near-distribution, and industrial settings.
We propose a novel self-supervised anomaly generation module that employs straightforward operations like noise addition and shuffling to patch features.
We extensively evaluated our approach on ten datasets, achieving state-of-the-art results in six and on-par performance in the remaining.
arXiv Detail & Related papers (2024-07-17T09:27:41Z) - PPIDSG: A Privacy-Preserving Image Distribution Sharing Scheme with GAN
in Federated Learning [2.0507547735926424]
Federated learning (FL) has attracted growing attention since it allows for privacy-preserving collaborative training on decentralized clients.
Recent works have revealed that it still has the risk of exposing private data to adversaries.
We propose a privacy-preserving image distribution sharing scheme with GAN (PPIDSG)
arXiv Detail & Related papers (2023-12-16T08:32:29Z) - Learning from Rich Semantics and Coarse Locations for Long-tailed Object
Detection [157.18560601328534]
RichSem is a robust method to learn rich semantics from coarse locations without the need of accurate bounding boxes.
We add a semantic branch to our detector to learn these soft semantics and enhance feature representations for long-tailed object detection.
Our method achieves state-of-the-art performance without requiring complex training and testing procedures.
arXiv Detail & Related papers (2023-10-18T17:59:41Z) - Automated Static Warning Identification via Path-based Semantic
Representation [37.70518599085676]
This paper employs deep neural networks' powerful feature extraction and representation abilities to generate code semantics from control flow graph paths for warning identification.
We fine-tune the pre-trained language model to encode the path sequences and capture the semantic representations for model building.
arXiv Detail & Related papers (2023-06-27T15:46:45Z) - Who Wrote this Code? Watermarking for Code Generation [53.24895162874416]
We propose Selective WatErmarking via Entropy Thresholding (SWEET) to detect machine-generated text.
Our experiments show that SWEET significantly improves code quality preservation while outperforming all baselines.
arXiv Detail & Related papers (2023-05-24T11:49:52Z) - Studying the Practices of Deploying Machine Learning Projects on Docker [9.979005459305117]
Docker is a containerization service that allows for convenient deployment of websites, databases, applications' APIs, and machine learning (ML) models with a few lines of code.
We conducted an exploratory study to understand how Docker is being used to deploy ML-based projects.
arXiv Detail & Related papers (2022-06-01T18:13:30Z) - Revisiting Consistency Regularization for Semi-supervised Change
Detection in Remote Sensing Images [60.89777029184023]
We propose a semi-supervised CD model in which we formulate an unsupervised CD loss in addition to the supervised Cross-Entropy (CE) loss.
Experiments conducted on two publicly available CD datasets show that the proposed semi-supervised CD method can reach closer to the performance of supervised CD.
arXiv Detail & Related papers (2022-04-18T17:59:01Z) - Unsupervised Object Detection with LiDAR Clues [70.73881791310495]
We present the first practical method for unsupervised object detection with the aid of LiDAR clues.
In our approach, candidate object segments based on 3D point clouds are firstly generated.
Then, an iterative segment labeling process is conducted to assign segment labels and to train a segment labeling network.
The labeling process is carefully designed so as to mitigate the issue of long-tailed and open-ended distribution.
arXiv Detail & Related papers (2020-11-25T18:59:54Z) - Adversarial Attack on Community Detection by Hiding Individuals [68.76889102470203]
We focus on black-box attack and aim to hide targeted individuals from the detection of deep graph community detection models.
We propose an iterative learning framework that takes turns to update two modules: one working as the constrained graph generator and the other as the surrogate community detection model.
arXiv Detail & Related papers (2020-01-22T09:50:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.