Doctor: Optimizing Container Rebuild Efficiency by Instruction Re-Orchestration
- URL: http://arxiv.org/abs/2504.01742v1
- Date: Wed, 02 Apr 2025 13:53:35 GMT
- Title: Doctor: Optimizing Container Rebuild Efficiency by Instruction Re-Orchestration
- Authors: Zhiling Zhu, Tieming Chen, Chengwei Liu, Han Liu, Qijie Song, Zhengzi Xu, Yang Liu,
- Abstract summary: We present Doctor, a method for improving Dockerfile build efficiency through instruction re-ordering.<n>We developed a dependency taxonomy based on Dockerfile syntax and a historical modification analysis to prioritize frequently modified instructions.<n>Experiments show Doctor improves 92.75% of Dockerfiles, reducing rebuild time by an average of 26.5%, with 12.82% of files achieving over a 50% reduction.
- Score: 11.027705516378875
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Containerization has revolutionized software deployment, with Docker leading the way due to its ease of use and consistent runtime environment. As Docker usage grows, optimizing Dockerfile performance, particularly by reducing rebuild time, has become essential for maintaining efficient CI/CD pipelines. However, existing optimization approaches primarily address single builds without considering the recurring rebuild costs associated with modifications and evolution, limiting long-term efficiency gains. To bridge this gap, we present Doctor, a method for improving Dockerfile build efficiency through instruction re-ordering that addresses key challenges: identifying instruction dependencies, predicting future modifications, ensuring behavioral equivalence, and managing the optimization computational complexity. We developed a comprehensive dependency taxonomy based on Dockerfile syntax and a historical modification analysis to prioritize frequently modified instructions. Using a weighted topological sorting algorithm, Doctor optimizes instruction order to minimize future rebuild time while maintaining functionality. Experiments on 2,000 GitHub repositories show that Doctor improves 92.75% of Dockerfiles, reducing rebuild time by an average of 26.5%, with 12.82% of files achieving over a 50% reduction. Notably, 86.2% of cases preserve functional similarity. These findings highlight best practices for Dockerfile management, enabling developers to enhance Docker efficiency through informed optimization strategies.
Related papers
- Efficient Token Compression for Vision Transformer with Spatial Information Preserved [59.79302182800274]
Token compression is essential for reducing the computational and memory requirements of transformer models.<n>We propose an efficient and hardware-compatible token compression method called Prune and Merge.
arXiv Detail & Related papers (2025-03-30T14:23:18Z) - An LLM-based Agent for Reliable Docker Environment Configuration [9.436480907117415]
Repo2Run is an agent designed to fully automate environment configuration and generate executable Dockerfiles for arbitrary Python repositories.
We address two major challenges: (1) enabling the LLM agent to configure environments within isolated Docker containers, and (2) ensuring the successful configuration process is recorded and accurately transferred to a Dockerfile without error.
We evaluate Repo2Runon our proposed benchmark of 420 recent Python repositories with unit tests, where it achieves an 86.4% success rate, outperforming the best baseline by 63.9%.
arXiv Detail & Related papers (2025-02-19T12:51:35Z) - Refactoring for Dockerfile Quality: A Dive into Developer Practices and Automation Potential [0.0]
This paper explores the utility and practicality of automating Dockerfile using 600files from 358 open-source projects.<n>Our approach leads to an average reduction of 32% in image size and a 6% decrease in build duration, with improvements in understandability and maintainability observed in 77% and 91% of cases.
arXiv Detail & Related papers (2025-01-23T23:10:47Z) - CI at Scale: Lean, Green, and Fast [0.45046553422244356]
SubmitQueue is a system designed to speculatively executes builds and only lands changes with successful outcomes.<n>This paper introduces enhancements to SubmitQueue, focusing on optimizing resource usage and improving build prioritization.<n>We observed a reduction in Continuous Integration (CI) resource usage by approximately 53%, CPU usage by 44%, and P95 waiting times by 37%.
arXiv Detail & Related papers (2025-01-07T00:04:29Z) - DiSK: Differentially Private Optimizer with Simplified Kalman Filter for Noise Reduction [57.83978915843095]
This paper introduces DiSK, a novel framework designed to significantly enhance the performance of differentially private gradients.
To ensure practicality for large-scale training, we simplify the Kalman filtering process, minimizing its memory and computational demands.
arXiv Detail & Related papers (2024-10-04T19:30:39Z) - Dockerfile Flakiness: Characterization and Repair [6.518508607788089]
We present the first comprehensive study of Dockerfile flakiness, featuring a nine-month analysis of 8,132 Dockerized projects.<n>We propose a taxonomy categorizing common flakiness causes, including dependency errors and server connectivity issues.<n>We introduce FLAKIDOCK, a novel repair framework combining static and dynamic analysis, similarity retrieval, and an iterative feedback loop powered by Large Language Models.
arXiv Detail & Related papers (2024-08-09T23:17:56Z) - Learning to Optimize for Reinforcement Learning [58.01132862590378]
Reinforcement learning (RL) is essentially different from supervised learning, and in practice, these learneds do not work well even in simple RL tasks.
Agent-gradient distribution is non-independent and identically distributed, leading to inefficient meta-training.
We show that, although only trained in toy tasks, our learned can generalize unseen complex tasks in Brax.
arXiv Detail & Related papers (2023-02-03T00:11:02Z) - DRIVE: Dockerfile Rule Mining and Violation Detection [6.510749313511299]
A Dockerfile defines a set of instructions to build Docker images, which can then be instantiated to support containerized applications.
Recent studies have revealed a considerable amount of quality issues with Dockerfiles.
We propose a novel approach to mine implicit rules and detect potential violations of such rules in Dockerfiles.
arXiv Detail & Related papers (2022-12-12T01:15:30Z) - VeLO: Training Versatile Learned Optimizers by Scaling Up [67.90237498659397]
We leverage the same scaling approach behind the success of deep learning to learn versatiles.
We train an ingest for deep learning which is itself a small neural network that ingests and outputs parameter updates.
We open source our learned, meta-training code, the associated train test data, and an extensive benchmark suite with baselines at velo-code.io.
arXiv Detail & Related papers (2022-11-17T18:39:07Z) - Gradient Coding with Dynamic Clustering for Straggler-Tolerant
Distributed Learning [55.052517095437]
gradient descent (GD) is widely employed to parallelize the learning task by distributing the dataset across multiple workers.
A significant performance bottleneck for the per-iteration completion time in distributed synchronous GD is $straggling$ workers.
Coded distributed techniques have been introduced recently to mitigate stragglers and to speed up GD iterations by assigning redundant computations to workers.
We propose a novel dynamic GC scheme, which assigns redundant data to workers to acquire the flexibility to choose from among a set of possible codes depending on the past straggling behavior.
arXiv Detail & Related papers (2021-03-01T18:51:29Z) - Parameter-Efficient Transfer Learning with Diff Pruning [108.03864629388404]
diff pruning is a simple approach to enable parameter-efficient transfer learning within the pretrain-finetune framework.
We find that models finetuned with diff pruning can match the performance of fully finetuned baselines on the GLUE benchmark.
arXiv Detail & Related papers (2020-12-14T12:34:01Z) - Tasks, stability, architecture, and compute: Training more effective
learned optimizers, and using them to train themselves [53.37905268850274]
We introduce a new, hierarchical, neural network parameterized, hierarchical with access to additional features such as validation loss to enable automatic regularization.
Most learneds have been trained on only a single task, or a small number of tasks.
We train ours on thousands of tasks, making use of orders of magnitude more compute, resulting in generalizes that perform better to unseen tasks.
arXiv Detail & Related papers (2020-09-23T16:35:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.