Plumber: Diagnosing and Removing Performance Bottlenecks in Machine
Learning Data Pipelines
- URL: http://arxiv.org/abs/2111.04131v1
- Date: Sun, 7 Nov 2021 17:15:57 GMT
- Title: Plumber: Diagnosing and Removing Performance Bottlenecks in Machine
Learning Data Pipelines
- Authors: Michael Kuchnik and Ana Klimovic and Jiri Simsa and George Amvrosiadis
and Virginia Smith
- Abstract summary: We propose Plumber, a tool for finding bottlenecks in Machine Learning (ML) input pipelines.
Across five representative ML pipelines, Plumber obtains speedups of up to 46x for pipelines.
By automating caching, Plumber obtains end-to-end speedups of over 40% compared to state-of-the-art tuners.
- Score: 7.022239953701528
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Input pipelines, which ingest and transform input data, are an essential part
of training Machine Learning (ML) models. However, it is challenging to
implement efficient input pipelines, as it requires reasoning about
parallelism, asynchrony, and variability in fine-grained profiling information.
Our analysis of over 2 million ML jobs in Google datacenters reveals that a
significant fraction of model training jobs could benefit from faster input
data pipelines. At the same time, our analysis reveals that most jobs do not
saturate host hardware, pointing in the direction of software-based
bottlenecks. Motivated by these findings, we propose Plumber, a tool for
finding bottlenecks in ML input pipelines. Plumber uses an extensible and
interprettable operational analysis analytical model to automatically tune
parallelism, prefetching, and caching under host resource constraints. Across
five representative ML pipelines, Plumber obtains speedups of up to 46x for
misconfigured pipelines. By automating caching, Plumber obtains end-to-end
speedups of over 40% compared to state-of-the-art tuners.
Related papers
- Instrumentation and Analysis of Native ML Pipelines via Logical Query Plans [3.2362171533623054]
We envision highly-automated software platforms to assist data scientists with developing, validating, monitoring, and analysing their Machine Learning pipelines.
We extract "logical query plans" from ML pipeline code relying on popular libraries.
Based on these plans, we automatically infer pipeline semantics and instrument and rewrite the ML pipelines to enable diverse use cases without requiring data scientists to manually annotate or rewrite their code.
arXiv Detail & Related papers (2024-07-10T11:35:02Z) - PARTIME: Scalable and Parallel Processing Over Time with Deep Neural
Networks [68.96484488899901]
We present PARTIME, a library designed to speed up neural networks whenever data is continuously streamed over time.
PARTIME starts processing each data sample at the time in which it becomes available from the stream.
Experiments are performed in order to empirically compare PARTIME with classic non-parallel neural computations in online learning.
arXiv Detail & Related papers (2022-10-17T14:49:14Z) - Pushing the Limits of Simple Pipelines for Few-Shot Learning: External
Data and Fine-Tuning Make a Difference [74.80730361332711]
Few-shot learning is an important and topical problem in computer vision.
We show that a simple transformer-based pipeline yields surprisingly good performance on standard benchmarks.
arXiv Detail & Related papers (2022-04-15T02:55:58Z) - Where Is My Training Bottleneck? Hidden Trade-Offs in Deep Learning
Preprocessing Pipelines [77.45213180689952]
Preprocessing pipelines in deep learning aim to provide sufficient data throughput to keep the training processes busy.
We introduce a new perspective on efficiently preparing datasets for end-to-end deep learning pipelines.
We obtain an increased throughput of 3x to 13x compared to an untuned system.
arXiv Detail & Related papers (2022-02-17T14:31:58Z) - SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines.
This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z) - FENXI: Deep-learning Traffic Analytics at the Edge [69.34903175081284]
We present FENXI, a system to run complex analytics by leveraging TPU.
FENXI decouples operations and traffic analytics which operates at different granularities.
Our analysis shows that FENXI can sustain forwarding line rate traffic processing requiring only limited resources.
arXiv Detail & Related papers (2021-05-25T08:02:44Z) - Production Machine Learning Pipelines: Empirical Analysis and
Optimization Opportunities [5.510431861706128]
We analyze provenance graphs of 3000 production ML pipelines at Google, comprising over 450,000 models trained, spanning a period of over four months.
Our analysis reveals the characteristics, components, and topologies of typical industry-strength ML pipelines at various granularities.
We identify several rich opportunities for optimization, leveraging traditional data management ideas.
arXiv Detail & Related papers (2021-03-30T00:46:29Z) - PipeTransformer: Automated Elastic Pipelining for Distributed Training
of Transformers [47.194426122333205]
PipeTransformer is a distributed training algorithm for Transformer models.
It automatically adjusts the pipelining and data parallelism by identifying and freezing some layers during the training.
We evaluate PipeTransformer using Vision Transformer (ViT) on ImageNet and BERT on GLUE and SQuAD datasets.
arXiv Detail & Related papers (2021-02-05T13:39:31Z) - tf.data: A Machine Learning Data Processing Framework [0.4588028371034406]
Training machine learning models requires feeding input data for models to ingest.
We present tf.data, a framework for building and executing efficient input pipelines for machine learning jobs.
We demonstrate that input pipeline performance is critical to the end-to-end training time of state-of-the-art machine learning models.
arXiv Detail & Related papers (2021-01-28T17:16:46Z) - AutoWeka4MCPS-AVATAR: Accelerating Automated Machine Learning Pipeline
Composition and Optimisation [13.116806430326513]
We propose a novel method to evaluate the validity of ML pipelines, without their execution, using a surrogate model (AVATAR)
The AVATAR generates a knowledge base by automatically learning the capabilities and effects of ML algorithms on datasets' characteristics.
Instead of executing the original ML pipeline to evaluate its validity, the AVATAR evaluates its surrogate model constructed by capabilities and effects of the ML pipeline components.
arXiv Detail & Related papers (2020-11-21T14:05:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.