Machine Learning Fleet Efficiency: Analyzing and Optimizing Large-Scale Google TPU Systems with ML Productivity Goodput
- URL: http://arxiv.org/abs/2502.06982v2
- Date: Sun, 20 Apr 2025 08:02:36 GMT
- Title: Machine Learning Fleet Efficiency: Analyzing and Optimizing Large-Scale Google TPU Systems with ML Productivity Goodput
- Authors: Arissa Wongpanich, Tayo Oguntebi, Jose Baiocchi Paredes, Yu Emma Wang, Phitchaya Mangpo Phothilimthana, Ritwika Mitra, Zongwei Zhou, Naveen Kumar, Vijay Janapa Reddi,
- Abstract summary: This paper presents a large-scale analysis of an ML fleet based on Google's TPUs.<n>We show how to leverage the "ML Productivity Goodput" metric to measure ML fleet efficiency.<n>We also present methods to identify and optimize performance bottlenecks using MPG.
- Score: 9.994725016006015
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent years have seen the emergence of machine learning (ML) workloads deployed in warehouse-scale computing (WSC) settings, also known as ML fleets. As the computational demands placed on ML fleets have increased due to the rise of large models and growing demand for ML applications, it has become increasingly critical to measure and improve the efficiency of such systems. However, there is not yet an established methodology to characterize ML fleet performance and identify potential performance optimizations accordingly. This paper presents a large-scale analysis of an ML fleet based on Google's TPUs, introducing a framework to capture fleet-wide efficiency, systematically evaluate performance characteristics, and identify optimization strategies for the fleet. We begin by defining an ML fleet, outlining its components, and analyzing an example Google ML fleet in production comprising thousands of accelerators running diverse workloads. Our study reveals several critical insights: first, ML fleets extend beyond the hardware layer, with model, data, framework, compiler, and scheduling layers significantly impacting performance; second, the heterogeneous nature of ML fleets poses challenges in characterizing individual workload performance; and third, traditional utilization-based metrics prove insufficient for ML fleet characterization. To address these challenges, we present the "ML Productivity Goodput" (MPG) metric to measure ML fleet efficiency. We show how to leverage this metric to characterize the fleet across the ML system stack. We also present methods to identify and optimize performance bottlenecks using MPG, providing strategies for managing warehouse-scale ML systems in general. Lastly, we demonstrate quantitative evaluations from applying these methods to a real ML fleet for internal-facing Google TPU workloads, where we observed tangible improvements.
Related papers
- EDCA - An Evolutionary Data-Centric AutoML Framework for Efficient Pipelines [0.276240219662896]
This work presents EDCA, an Evolutionary Data Centric AutoML framework.
Data quality is usually an overlooked part of AutoML and continues to be a manual and time-consuming task.
EDCA was compared to FLAML and TPOT, two frameworks at the top of the AutoML benchmarks.
arXiv Detail & Related papers (2025-03-06T11:46:07Z) - IMPROVE: Iterative Model Pipeline Refinement and Optimization Leveraging LLM Agents [17.301758094000125]
Large language model (LLM) agents have emerged as a promising solution to automate the development of computer vision models.
We introduce Iterative Refinement, a novel strategy for LLM-driven ML pipeline design.
Iterative Refinement improves stability, interpretability, and overall model performance.
arXiv Detail & Related papers (2025-02-25T01:52:37Z) - Adaptive Pruning for Large Language Models with Structural Importance Awareness [66.2690963378878]
Large language models (LLMs) have significantly improved language understanding and generation capabilities.<n>LLMs are difficult to deploy on resource-constrained edge devices due to their high computational and storage resource demands.<n>We propose structurally-aware adaptive pruning (SAAP) to significantly reduce the computational and memory costs while maintaining model performance.
arXiv Detail & Related papers (2024-12-19T18:08:04Z) - Large Language Models for Constructing and Optimizing Machine Learning Workflows: A Survey [4.917456871628609]
Building effective machine learning (ML) to address complex tasks is a primary focus of the Automatic ML (AutoML) community.<n>Recently, the integration of Large Language Models (LLMs) into ML has shown great potential for automating and enhancing various stages of the ML pipeline.
arXiv Detail & Related papers (2024-11-11T21:54:26Z) - CubicML: Automated ML for Large ML Systems Co-design with ML Prediction of Performance [7.425372356516303]
Scaling up deep learning models has been proven effective to improve intelligence of machine learning (ML) models.
In this paper, we propose CubicML which uses ML to automatically optimize training performance of large distributed ML systems.
We prove that CubicML can effectively optimize training speed of in-house recommendation models with 73 billion parameters and large language models up to 405 billion parameters at Meta ads.
arXiv Detail & Related papers (2024-09-06T19:55:21Z) - Position: A Call to Action for a Human-Centered AutoML Paradigm [83.78883610871867]
Automated machine learning (AutoML) was formed around the fundamental objectives of automatically and efficiently configuring machine learning (ML)
We argue that a key to unlocking AutoML's full potential lies in addressing the currently underexplored aspect of user interaction with AutoML systems.
arXiv Detail & Related papers (2024-06-05T15:05:24Z) - Efficient Multimodal Large Language Models: A Survey [60.7614299984182]
Multimodal Large Language Models (MLLMs) have demonstrated remarkable performance in tasks such as visual question answering, visual understanding and reasoning.
The extensive model size and high training and inference costs have hindered the widespread application of MLLMs in academia and industry.
This survey provides a comprehensive and systematic review of the current state of efficient MLLMs.
arXiv Detail & Related papers (2024-05-17T12:37:10Z) - Characterization of Large Language Model Development in the Datacenter [55.9909258342639]
Large Language Models (LLMs) have presented impressive performance across several transformative tasks.
However, it is non-trivial to efficiently utilize large-scale cluster resources to develop LLMs.
We present an in-depth characterization study of a six-month LLM development workload trace collected from our GPU datacenter Acme.
arXiv Detail & Related papers (2024-03-12T13:31:14Z) - In Situ Framework for Coupling Simulation and Machine Learning with
Application to CFD [51.04126395480625]
Recent years have seen many successful applications of machine learning (ML) to facilitate fluid dynamic computations.
As simulations grow, generating new training datasets for traditional offline learning creates I/O and storage bottlenecks.
This work offers a solution by simplifying this coupling and enabling in situ training and inference on heterogeneous clusters.
arXiv Detail & Related papers (2023-06-22T14:07:54Z) - Benchmarking Automated Machine Learning Methods for Price Forecasting
Applications [58.720142291102135]
We show the possibility of substituting manually created ML pipelines with automated machine learning (AutoML) solutions.
Based on the CRISP-DM process, we split the manual ML pipeline into a machine learning and non-machine learning part.
We show in a case study for the industrial use case of price forecasting, that domain knowledge combined with AutoML can weaken the dependence on ML experts.
arXiv Detail & Related papers (2023-04-28T10:27:38Z) - ezDPS: An Efficient and Zero-Knowledge Machine Learning Inference
Pipeline [2.0813318162800707]
We propose ezDPS, a new efficient and zero-knowledge Machine Learning inference scheme.
ezDPS is a zkML pipeline in which the data is processed in multiple stages for high accuracy.
We show that ezDPS achieves one-to-three orders of magnitude more efficient than the generic circuit-based approach in all metrics.
arXiv Detail & Related papers (2022-12-11T06:47:28Z) - Towards an Efficient ML System: Unveiling a Trade-off between Task
Accuracy and Engineering Efficiency in a Large-scale Car Sharing Platform [0.0]
We propose an textitefficiency-centric ML system that illustrates numerous datasets, classifiers, out-of-distribution detectors, and prediction tables existing in the practitioners' domain into a single ML.
Under various image recognition tasks in the real world car-sharing platform, our study how we established the proposed system and lessons learned from this journey.
arXiv Detail & Related papers (2022-10-10T15:40:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.