Related papers: Charliecloud's layer-free, Git-based container build cache

Charliecloud's layer-free, Git-based container build cache

URL: http://arxiv.org/abs/2309.00166v1
Date: Thu, 31 Aug 2023 23:05:16 GMT
Title: Charliecloud's layer-free, Git-based container build cache
Authors: Reid Priedhorsky (1), Jordan Ogas (1), Claude H. (Rusty) Davis IV (1), Z. Noah Hounshel (1 and 2), Ashlyn Lee (1 and 3), Benjamin Stormer (1 and 4), R. Shane Goff (1) ((1) Los Alamos National Laboratory, (2) University of North Carolina Wilmington, (3) Colorado State University, (4) University of Texas at Austin)
Abstract summary: This image is built by interpreting instructions in a machine-readable recipe, which is faster with a build cache that stores instruction results for re-use. Standard approach is a many-layered union, encoding differences between layers as tar archives. Our experiments show this performs similarly to layered caches on both build time and disk usage, with a considerable advantage for many-instruction recipes.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A popular approach to deploying scientific applications in high performance computing (HPC) is Linux containers, which package an application and all its dependencies as a single unit. This image is built by interpreting instructions in a machine-readable recipe, which is faster with a build cache that stores instruction results for re-use. The standard approach (used e.g. by Docker and Podman) is a many-layered union filesystem, encoding differences between layers as tar archives. Our experiments show this performs similarly to layered caches on both build time and disk usage, with a considerable advantage for many-instruction recipes. Our approach also has structural advantages: better diff format, lower cache overhead, and better file de-duplication. These results show that a Git-based cache for layer-free container implementations is not only possible but may outperform the layered approach on important dimensions.

Related papers

Mustafar: Promoting Unstructured Sparsity for KV Cache Pruning in LLM Inference [2.0449242727404235]
unstructured sparsity significantly improves KV cache compression for LLMs.<n>We find per-token magnitude-based pruning as highly effective for both Key and Value caches under unstructured sparsity.
arXiv Detail & Related papers (2025-05-28T22:32:15Z)
Cache-Craft: Managing Chunk-Caches for Efficient Retrieval-Augmented Generation [14.842469293627271]
CacheCraft is a system for managing reusing precomputed KVs corresponding to the text chunks. We show how to identify chunk-caches that are reusable, how to efficiently perform a small fraction of recomputation to fix the cache, and how to efficiently store and evict chunk-caches in the hardware.
arXiv Detail & Related papers (2025-02-05T14:12:33Z)
BitStack: Any-Size Compression of Large Language Models in Variable Memory Environments [53.71158537264695]
Large language models (LLMs) have revolutionized numerous applications, yet their deployment remains challenged by memory constraints on local devices.<n>We introduce textbfBitStack, a novel, training-free weight compression approach that enables megabyte-level trade-offs between memory usage and model performance.
arXiv Detail & Related papers (2024-10-31T13:26:11Z)
KVSharer: Efficient Inference via Layer-Wise Dissimilar KV Cache Sharing [58.29726147780976]
We propose a plug-and-play method called textit KVSharer, which shares the KV cache between layers to achieve layer-wise compression. Experiments show that textit KVSharer can reduce KV cache computation by 30%, thereby lowering memory consumption. We verify that textit KVSharer is compatible with existing intra-layer KV cache compression methods, and combining both can further save memory.
arXiv Detail & Related papers (2024-10-24T08:06:41Z)
Compute Or Load KV Cache? Why Not Both? [6.982874528357836]
Cake is a novel KV cache loader, which employs a bidirectional parallelized KV cache generation strategy. It simultaneously and dynamically loads saved KV cache from prefix cache locations and computes KV cache on local GPU. It offers up to 68.1% Time To First Token (TTFT) reduction compare with compute-only method and 94.6% TTFT reduction compare with I/O-only method.
arXiv Detail & Related papers (2024-10-04T01:11:09Z)
Efficient Inference of Vision Instruction-Following Models with Elastic Cache [76.44955111634545]
We introduce Elastic Cache, a novel strategy for efficient deployment of instruction-following large vision-language models. We propose an importance-driven cache merging strategy to prune redundancy caches. For instruction encoding, we utilize the frequency to evaluate the importance of caches. Results on a range of LVLMs demonstrate that Elastic Cache not only boosts efficiency but also notably outperforms existing pruning methods in language generation.
arXiv Detail & Related papers (2024-07-25T15:29:05Z)
Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference [78.65321721142624]
We focus on a memory bottleneck imposed by the key-value ( KV) cache. Existing KV cache methods approach this problem by pruning or evicting large swaths of relatively less important KV pairs. We propose LESS, a simple integration of a constant sized cache with eviction-based cache methods.
arXiv Detail & Related papers (2024-02-14T18:54:56Z)
NASiam: Efficient Representation Learning using Neural Architecture Search for Siamese Networks [76.8112416450677]
Siamese networks are one of the most trending methods to achieve self-supervised visual representation learning (SSL) NASiam is a novel approach that uses for the first time differentiable NAS to improve the multilayer perceptron projector and predictor (encoder/predictor pair) NASiam reaches competitive performance in both small-scale (i.e., CIFAR-10/CIFAR-100) and large-scale (i.e., ImageNet) image classification datasets while costing only a few GPU hours.
arXiv Detail & Related papers (2023-01-31T19:48:37Z)
Training Personalized Recommendation Systems from (GPU) Scratch: Look Forward not Backwards [1.7733623930581417]
Personalized recommendation models (RecSys) are one of the most popular machine learning workload serviced by hyperscalers. A critical challenge of training RecSys is its high memory capacity requirements, reaching hundreds of GBs to TBs of model size. In RecSys, the so-called embedding layers account for the majority of memory usage so current systems employ a hybrid CPU-GPU design to have the large CPU memory store the memory hungry embedding layers.
arXiv Detail & Related papers (2022-05-10T07:05:20Z)
Generalizing Few-Shot NAS with Gradient Matching [165.5690495295074]
One-Shot methods train one supernet to approximate the performance of every architecture in the search space via weight-sharing. Few-Shot NAS reduces the level of weight-sharing by splitting the One-Shot supernet into multiple separated sub-supernets. It significantly outperforms its Few-Shot counterparts while surpassing previous comparable methods in terms of the accuracy of derived architectures.
arXiv Detail & Related papers (2022-03-29T03:06:16Z)
CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms [0.0]
CleanRL is an open-source library that provides high-quality single-file implementations of Deep Reinforcement Learning algorithms. It provides a simpler yet scalable developing experience by having a straightforward and integrating production tools.
arXiv Detail & Related papers (2021-11-16T22:44:56Z)
Prioritized Architecture Sampling with Monto-Carlo Tree Search [54.72096546595955]
One-shot neural architecture search (NAS) methods significantly reduce the search cost by considering the whole search space as one network. In this paper, we introduce a sampling strategy based on Monte Carlo tree search (MCTS) with the search space modeled as a Monte Carlo tree (MCT) For a fair comparison, we construct an open-source NAS benchmark of a macro search space evaluated on CIFAR-10, namely NAS-Bench-Macro.
arXiv Detail & Related papers (2021-03-22T15:09:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.