Pathways: Asynchronous Distributed Dataflow for ML
- URL: http://arxiv.org/abs/2203.12533v1
- Date: Wed, 23 Mar 2022 16:50:53 GMT
- Title: Pathways: Asynchronous Distributed Dataflow for ML
- Authors: Paul Barham and Aakanksha Chowdhery and Jeff Dean and Sanjay Ghemawat
and Steven Hand and Dan Hurt and Michael Isard and Hyeontaek Lim and Ruoming
Pang and Sudip Roy and Brennan Saeta and Parker Schuh and Ryan Sepassi and
Laurent El Shafey and Chandramohan A. Thekkath and Yonghui Wu
- Abstract summary: We present the design of a new large scale orchestration layer for accelerators.
Our system, Pathways, is explicitly designed to enable exploration of new systems and ML research ideas.
- Score: 24.940220376358457
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present the design of a new large scale orchestration layer for
accelerators. Our system, Pathways, is explicitly designed to enable
exploration of new systems and ML research ideas, while retaining state of the
art performance for current models. Pathways uses a sharded dataflow graph of
asynchronous operators that consume and produce futures, and efficiently
gang-schedules heterogeneous parallel computations on thousands of accelerators
while coordinating data transfers over their dedicated interconnects. Pathways
makes use of a novel asynchronous distributed dataflow design that lets the
control plane execute in parallel despite dependencies in the data plane. This
design, with careful engineering, allows Pathways to adopt a single-controller
model that makes it easier to express complex new parallelism patterns. We
demonstrate that Pathways can achieve performance parity (~100% accelerator
utilization) with state-of-the-art systems when running SPMD computations over
2048 TPUs, while also delivering throughput comparable to the SPMD case for
Transformer models that are pipelined across 16 stages, or sharded across two
islands of accelerators connected over a data center network.
Related papers
- Pipeline Gradient-based Model Training on Analog In-memory Accelerators [27.7426132507863]
In-memory AIMC accelerators can accelerate the training of large deep neural models (DNN) in an energy-efficient way.
We propose synchronous and asynchronous pipeline parallelism for AIMC accelerators inspired by the pipeline in digital domains.
This paper provides a theoretical convergence guarantee for both synchronous and asynchronous pipelines in terms of both sampling and clock cycle complexity.
arXiv Detail & Related papers (2024-10-19T16:58:34Z) - FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency.
We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs)
We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z) - ALERT-Transformer: Bridging Asynchronous and Synchronous Machine Learning for Real-Time Event-based Spatio-Temporal Data [8.660721666999718]
We propose a hybrid pipeline composed of asynchronous sensing and synchronous processing.
We achieve performances state-of-the-art with a lower latency than competitors.
arXiv Detail & Related papers (2024-02-02T13:17:19Z) - FPTN: Fast Pure Transformer Network for Traffic Flow Forecasting [6.485778915696199]
Traffic flow forecasting is challenging due to the complex correlations in traffic flow data.
Existing Transformer-based methods treat traffic flow forecasting as time series (MTS) forecasting.
We propose a Fast Pure Transformer Network (FPTN) in this paper.
arXiv Detail & Related papers (2023-03-14T07:55:50Z) - STLGRU: Spatio-Temporal Lightweight Graph GRU for Traffic Flow
Prediction [0.40964539027092917]
We propose STLGRU, a novel traffic forecasting model for predicting traffic flow accurately.
Our proposed STLGRU can effectively capture dynamic local and global spatial-temporal relations of traffic networks.
Our method can not only achieve state-of-the-art performance but also exhibit competitive computational efficiency.
arXiv Detail & Related papers (2022-12-08T20:24:59Z) - PARTIME: Scalable and Parallel Processing Over Time with Deep Neural
Networks [68.96484488899901]
We present PARTIME, a library designed to speed up neural networks whenever data is continuously streamed over time.
PARTIME starts processing each data sample at the time in which it becomes available from the stream.
Experiments are performed in order to empirically compare PARTIME with classic non-parallel neural computations in online learning.
arXiv Detail & Related papers (2022-10-17T14:49:14Z) - Adaptive Machine Learning for Time-Varying Systems: Low Dimensional
Latent Space Tuning [91.3755431537592]
We present a recently developed method of adaptive machine learning for time-varying systems.
Our approach is to map very high (N>100k) dimensional inputs into the low dimensional (N2) latent space at the output of the encoder section of an encoder-decoder CNN.
This method allows us to learn correlations within and to track their evolution in real time based on feedback without interrupts.
arXiv Detail & Related papers (2021-07-13T16:05:28Z) - TransMOT: Spatial-Temporal Graph Transformer for Multiple Object
Tracking [74.82415271960315]
We propose a solution named TransMOT to efficiently model the spatial and temporal interactions among objects in a video.
TransMOT is not only more computationally efficient than the traditional Transformer, but it also achieves better tracking accuracy.
The proposed method is evaluated on multiple benchmark datasets including MOT15, MOT16, MOT17, and MOT20.
arXiv Detail & Related papers (2021-04-01T01:49:05Z) - Prediction of Traffic Flow via Connected Vehicles [77.11902188162458]
We propose a Short-term Traffic flow Prediction framework so that transportation authorities take early actions to control flow and prevent congestion.
We anticipate flow at future time frames on a target road segment based on historical flow data and innovative features such as real time feeds and trajectory data provided by Connected Vehicles (CV) technology.
We show how this novel approach allows advanced modelling by integrating into the forecasting of flow, the impact of various events that CV realistically encountered on segments along their trajectory.
arXiv Detail & Related papers (2020-07-10T16:00:44Z) - Taurus: A Data Plane Architecture for Per-Packet ML [59.1343317736213]
We present the design and implementation of Taurus, a data plane for line-rate inference.
Our evaluation of a Taurus switch ASIC shows that Taurus operates orders of magnitude faster than a server-based control plane.
arXiv Detail & Related papers (2020-02-12T09:18:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.