A Bring-Your-Own-Model Approach for ML-Driven Storage Placement in Warehouse-Scale Computers
- URL: http://arxiv.org/abs/2501.05651v2
- Date: Sat, 19 Apr 2025 05:31:22 GMT
- Title: A Bring-Your-Own-Model Approach for ML-Driven Storage Placement in Warehouse-Scale Computers
- Authors: Chenxi Yang, Yan Li, Martin Maas, Mustafa Uysal, Ubaid Ullah Hafeez, Arif Merchant, Richard McDougall,
- Abstract summary: Storage systems account for a major portion of the total cost of ownership (TCO) of warehouse-scale computers.<n>Machine learning (ML)-based methods for solving key problems in storage system efficiency, such as data placement, have shown significant promise.<n>We study this problem in the context of real-world hyperscale data centers at Google.
- Score: 4.849222239746218
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Storage systems account for a major portion of the total cost of ownership (TCO) of warehouse-scale computers, and thus have a major impact on the overall system's efficiency. Machine learning (ML)-based methods for solving key problems in storage system efficiency, such as data placement, have shown significant promise. However, there are few known practical deployments of such methods. Studying this problem in the context of real-world hyperscale data centers at Google, we identify a number of challenges that we believe cause this lack of practical adoption. Specifically, prior work assumes a monolithic model that resides entirely within the storage layer, an unrealistic assumption in real-world deployments with frequently changing workloads. To address this problem, we introduce a cross-layer approach where workloads instead ''bring their own model''. This strategy moves ML out of the storage system and instead allows each workload to train its own lightweight model at the application layer, capturing the workload's specific characteristics. These small, interpretable models generate predictions that guide a co-designed scheduling heuristic at the storage layer, enabling adaptation to diverse online environments. We build a proof-of-concept of this approach in a production distributed computation framework at Google. Evaluations in a test deployment and large-scale simulation studies using production traces show improvements of as much as 3.47$\times$ in TCO savings compared to state-of-the-art baselines.
Related papers
- MLKV: Efficiently Scaling up Large Embedding Model Training with Disk-based Key-Value Storage [22.848456481878568]
This paper presents MLKV, an efficient, reusable data storage framework designed to address the scalability challenges in embedding model training.
In experiments on open-source workloads, MLKV outperforms offloading strategies built on top of industrial-strength key-value stores by 1.6-12.6x.
arXiv Detail & Related papers (2025-04-02T08:57:01Z) - Cost-Efficient Continual Learning with Sufficient Exemplar Memory [55.77835198580209]
Continual learning (CL) research typically assumes highly constrained exemplar memory resources.
In this work, we investigate CL in a novel setting where exemplar memory is ample.
Our method achieves state-of-the-art performance while reducing the computational cost to a quarter or third of existing methods.
arXiv Detail & Related papers (2025-02-11T05:40:52Z) - Dynamic Adaptation in Data Storage: Real-Time Machine Learning for Enhanced Prefetching [40.13303683102544]
This study explores the application of streaming machine learning to revolutionize data prefetching within multi-tiered storage systems.
Unlike traditional batch-trained models, streaming machine learning offers adaptability, real-time insights, and computational efficiency.
arXiv Detail & Related papers (2024-12-29T17:39:37Z) - A Survey on Large Language Model Acceleration based on KV Cache Management [21.4802409745396]
Large Language Models (LLMs) have revolutionized a wide range of domains such as natural language processing, computer vision, and multi-modal tasks.<n>The computational and memory demands of LLMs pose significant challenges when scaling them to real-world, long-context, and real-time applications.<n>This survey provides a comprehensive overview of KV cache management strategies for LLM acceleration, categorizing them into token-level, model-level, and system-level optimizations.
arXiv Detail & Related papers (2024-12-27T04:17:57Z) - LSAQ: Layer-Specific Adaptive Quantization for Large Language Model Deployment [13.235417359529965]
We propose LSAQ (Layer-Specific Adaptive Quantization), a system for adaptive quantization and dynamic deployment of large language models (LLMs) based on layer importance.
The system adaptively adjusts quantization strategies in real time according to the resource availability of edge devices, assigning different precision levels to layers of varying importance.
arXiv Detail & Related papers (2024-12-24T03:43:15Z) - Bullion: A Column Store for Machine Learning [4.096087402737292]
This paper presents Bullion, a columnar storage system tailored for machine learning workloads.
Bundy addresses the complexities of data compliance, optimize the encoding of long sequence sparse features, efficiently manages wide-table projections, introduces feature quantization in storage, and provides a comprehensive cascading encoding framework.
Preliminary experimental results and theoretical analysis demonstrate Bullion's improved ability to deliver strong performance in the face of the unique demands of machine learning workloads.
arXiv Detail & Related papers (2024-04-13T05:01:54Z) - Control and Automation for Industrial Production Storage Zone: Generation of Optimal Route Using Image Processing [49.1574468325115]
This article focuses on developing an industrial automation method for a zone of a production line model using the DIP.
The neo-cascade methodology employed allowed for defining each of the stages in an adequate way, ensuring the inclusion of the relevant methods for its development.
The system was based on the OpenCV library; tool focused on artificial vision, which was implemented on an object-oriented programming (OOP) platform based on Java language.
arXiv Detail & Related papers (2024-03-15T06:50:19Z) - SQLNet: Scale-Modulated Query and Localization Network for Few-Shot
Class-Agnostic Counting [71.38754976584009]
The class-agnostic counting (CAC) task has recently been proposed to solve the problem of counting all objects of an arbitrary class with several exemplars given in the input image.
We propose a novel localization-based CAC approach, termed Scale-modulated Query and Localization Network (Net)
It fully explores the scales of exemplars in both the query and localization stages and achieves effective counting by accurately locating each object and predicting its approximate size.
arXiv Detail & Related papers (2023-11-16T16:50:56Z) - ECoFLaP: Efficient Coarse-to-Fine Layer-Wise Pruning for Vision-Language
Models [70.45441031021291]
Large Vision-Language Models (LVLMs) can understand the world comprehensively by integrating rich information from different modalities.
LVLMs are often problematic due to their massive computational/energy costs and carbon consumption.
We propose Efficient Coarse-to-Fine LayerWise Pruning (ECoFLaP), a two-stage coarse-to-fine weight pruning approach for LVLMs.
arXiv Detail & Related papers (2023-10-04T17:34:00Z) - In Situ Framework for Coupling Simulation and Machine Learning with
Application to CFD [51.04126395480625]
Recent years have seen many successful applications of machine learning (ML) to facilitate fluid dynamic computations.
As simulations grow, generating new training datasets for traditional offline learning creates I/O and storage bottlenecks.
This work offers a solution by simplifying this coupling and enabling in situ training and inference on heterogeneous clusters.
arXiv Detail & Related papers (2023-06-22T14:07:54Z) - Unifying Synergies between Self-supervised Learning and Dynamic
Computation [53.66628188936682]
We present a novel perspective on the interplay between SSL and DC paradigms.
We show that it is feasible to simultaneously learn a dense and gated sub-network from scratch in a SSL setting.
The co-evolution during pre-training of both dense and gated encoder offers a good accuracy-efficiency trade-off.
arXiv Detail & Related papers (2023-01-22T17:12:58Z) - Towards an Efficient ML System: Unveiling a Trade-off between Task
Accuracy and Engineering Efficiency in a Large-scale Car Sharing Platform [0.0]
We propose an textitefficiency-centric ML system that illustrates numerous datasets, classifiers, out-of-distribution detectors, and prediction tables existing in the practitioners' domain into a single ML.
Under various image recognition tasks in the real world car-sharing platform, our study how we established the proposed system and lessons learned from this journey.
arXiv Detail & Related papers (2022-10-10T15:40:50Z) - Asynchronous Parallel Incremental Block-Coordinate Descent for
Decentralized Machine Learning [55.198301429316125]
Machine learning (ML) is a key technique for big-data-driven modelling and analysis of massive Internet of Things (IoT) based intelligent and ubiquitous computing.
For fast-increasing applications and data amounts, distributed learning is a promising emerging paradigm since it is often impractical or inefficient to share/aggregate data.
This paper studies the problem of training an ML model over decentralized systems, where data are distributed over many user devices.
arXiv Detail & Related papers (2022-02-07T15:04:15Z) - SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines.
This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z) - A Survey on Large-scale Machine Learning [67.6997613600942]
Machine learning can provide deep insights into data, allowing machines to make high-quality predictions.
Most sophisticated machine learning approaches suffer from huge time costs when operating on large-scale data.
Large-scale Machine Learning aims to learn patterns from big data with comparable performance efficiently.
arXiv Detail & Related papers (2020-08-10T06:07:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.