Related papers: Pathways: Asynchronous Distributed Dataflow for ML

Pathways: Asynchronous Distributed Dataflow for ML

URL: http://arxiv.org/abs/2203.12533v1
Date: Wed, 23 Mar 2022 16:50:53 GMT
Title: Pathways: Asynchronous Distributed Dataflow for ML
Authors: Paul Barham and Aakanksha Chowdhery and Jeff Dean and Sanjay Ghemawat and Steven Hand and Dan Hurt and Michael Isard and Hyeontaek Lim and Ruoming Pang and Sudip Roy and Brennan Saeta and Parker Schuh and Ryan Sepassi and Laurent El Shafey and Chandramohan A. Thekkath and Yonghui Wu
Abstract summary: We present the design of a new large scale orchestration layer for accelerators. Our system, Pathways, is explicitly designed to enable exploration of new systems and ML research ideas.
Score: 24.940220376358457
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present the design of a new large scale orchestration layer for accelerators. Our system, Pathways, is explicitly designed to enable exploration of new systems and ML research ideas, while retaining state of the art performance for current models. Pathways uses a sharded dataflow graph of asynchronous operators that consume and produce futures, and efficiently gang-schedules heterogeneous parallel computations on thousands of accelerators while coordinating data transfers over their dedicated interconnects. Pathways makes use of a novel asynchronous distributed dataflow design that lets the control plane execute in parallel despite dependencies in the data plane. This design, with careful engineering, allows Pathways to adopt a single-controller model that makes it easier to express complex new parallelism patterns. We demonstrate that Pathways can achieve performance parity (~100% accelerator utilization) with state-of-the-art systems when running SPMD computations over 2048 TPUs, while also delivering throughput comparable to the SPMD case for Transformer models that are pipelined across 16 stages, or sharded across two islands of accelerators connected over a data center network.

Related papers

Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation [20.117825519637357]
We introduce Multiverse, a new generative model that enables natively parallel generation.<n>Next, we build a real-world Multiverse reasoning model with co-design curation of data, algorithm, and system.<n>For data creation, we develop Multiverse Curator, an automated LLM-assisted pipeline.<n>We also implement Multiverse Engine to support parallel inference.
arXiv Detail & Related papers (2025-06-11T17:59:23Z)
InTAR: Inter-Task Auto-Reconfigurable Accelerator Design for High Data Volume Variation in DNNs [5.762543012823378]
InTAR is a novel accelerator design methodology for HDV applications on FPGAs. It switches execution patterns automatically with a static schedule determined before circuit design. InTAR achieves a high clock frequency with fewer resources and low reconfiguration time.
arXiv Detail & Related papers (2025-02-12T21:43:51Z)
Pipeline Gradient-based Model Training on Analog In-memory Accelerators [27.7426132507863]
In-memory AIMC accelerators can accelerate the training of large deep neural models (DNN) in an energy-efficient way. We propose synchronous and asynchronous pipeline parallelism for AIMC accelerators inspired by the pipeline in digital domains. This paper provides a theoretical convergence guarantee for both synchronous and asynchronous pipelines in terms of both sampling and clock cycle complexity.
arXiv Detail & Related papers (2024-10-19T16:58:34Z)
FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency. We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs) We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z)
ALERT-Transformer: Bridging Asynchronous and Synchronous Machine Learning for Real-Time Event-based Spatio-Temporal Data [8.660721666999718]
We propose a hybrid pipeline composed of asynchronous sensing and synchronous processing. We achieve performances state-of-the-art with a lower latency than competitors.
arXiv Detail & Related papers (2024-02-02T13:17:19Z)
FPTN: Fast Pure Transformer Network for Traffic Flow Forecasting [6.485778915696199]
Traffic flow forecasting is challenging due to the complex correlations in traffic flow data. Existing Transformer-based methods treat traffic flow forecasting as time series (MTS) forecasting. We propose a Fast Pure Transformer Network (FPTN) in this paper.
arXiv Detail & Related papers (2023-03-14T07:55:50Z)
STLGRU: Spatio-Temporal Lightweight Graph GRU for Traffic Flow Prediction [0.40964539027092917]
We propose STLGRU, a novel traffic forecasting model for predicting traffic flow accurately. Our proposed STLGRU can effectively capture dynamic local and global spatial-temporal relations of traffic networks. Our method can not only achieve state-of-the-art performance but also exhibit competitive computational efficiency.
arXiv Detail & Related papers (2022-12-08T20:24:59Z)
PARTIME: Scalable and Parallel Processing Over Time with Deep Neural Networks [68.96484488899901]
We present PARTIME, a library designed to speed up neural networks whenever data is continuously streamed over time. PARTIME starts processing each data sample at the time in which it becomes available from the stream. Experiments are performed in order to empirically compare PARTIME with classic non-parallel neural computations in online learning.
arXiv Detail & Related papers (2022-10-17T14:49:14Z)
Adaptive Machine Learning for Time-Varying Systems: Low Dimensional Latent Space Tuning [91.3755431537592]
We present a recently developed method of adaptive machine learning for time-varying systems. Our approach is to map very high (N>100k) dimensional inputs into the low dimensional (N2) latent space at the output of the encoder section of an encoder-decoder CNN. This method allows us to learn correlations within and to track their evolution in real time based on feedback without interrupts.
arXiv Detail & Related papers (2021-07-13T16:05:28Z)
TransMOT: Spatial-Temporal Graph Transformer for Multiple Object Tracking [74.82415271960315]
We propose a solution named TransMOT to efficiently model the spatial and temporal interactions among objects in a video. TransMOT is not only more computationally efficient than the traditional Transformer, but it also achieves better tracking accuracy. The proposed method is evaluated on multiple benchmark datasets including MOT15, MOT16, MOT17, and MOT20.
arXiv Detail & Related papers (2021-04-01T01:49:05Z)
Prediction of Traffic Flow via Connected Vehicles [77.11902188162458]
We propose a Short-term Traffic flow Prediction framework so that transportation authorities take early actions to control flow and prevent congestion. We anticipate flow at future time frames on a target road segment based on historical flow data and innovative features such as real time feeds and trajectory data provided by Connected Vehicles (CV) technology. We show how this novel approach allows advanced modelling by integrating into the forecasting of flow, the impact of various events that CV realistically encountered on segments along their trajectory.
arXiv Detail & Related papers (2020-07-10T16:00:44Z)
Taurus: A Data Plane Architecture for Per-Packet ML [59.1343317736213]
We present the design and implementation of Taurus, a data plane for line-rate inference. Our evaluation of a Taurus switch ASIC shows that Taurus operates orders of magnitude faster than a server-based control plane.
arXiv Detail & Related papers (2020-02-12T09:18:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.