PartIR: Composing SPMD Partitioning Strategies for Machine Learning
- URL: http://arxiv.org/abs/2401.11202v3
- Date: Mon, 4 Mar 2024 04:23:32 GMT
- Title: PartIR: Composing SPMD Partitioning Strategies for Machine Learning
- Authors: Sami Alabed, Daniel Belov, Bart Chrzaszcz, Juliana Franco, Dominik
Grewe, Dougal Maclaurin, James Molloy, Tom Natan, Tamara Norman, Xiaoyue Pan,
Adam Paszke, Norman A. Rink, Michael Schaarschmidt, Timur Sitdikov, Agnieszka
Swietlik, Dimitrios Vytiniotis, Joel Wee
- Abstract summary: We present PartIR, our design for a NN partitioning system.
PartIR is focused on an incremental approach to rewriting and is hardware-and-runtime agnostic.
We evaluate PartIR on several different models to demonstrate its predictability, expressibility, and ability to reach peak performance.
- Score: 1.145010277058103
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Training of modern large neural networks (NN) requires a combination of
parallelization strategies encompassing data, model, or optimizer sharding.
When strategies increase in complexity, it becomes necessary for partitioning
tools to be 1) expressive, allowing the composition of simpler strategies, and
2) predictable to estimate performance analytically. We present PartIR, our
design for a NN partitioning system. PartIR is focused on an incremental
approach to rewriting and is hardware-and-runtime agnostic. We present a simple
but powerful API for composing sharding strategies and a simulator to validate
them. The process is driven by high-level programmer-issued partitioning
tactics, which can be both manual and automatic. Importantly, the tactics are
specified separately from the model code, making them easy to change. We
evaluate PartIR on several different models to demonstrate its predictability,
expressibility, and ability to reach peak performance..
Related papers
- The N-Grammys: Accelerating Autoregressive Inference with Learning-Free Batched Speculation [48.52206677611072]
Speculative decoding aims to speed up autoregressive generation of a language model by verifying in parallel the tokens generated by a smaller draft model.
We show that combinations of simple strategies can achieve significant inference speedups over different tasks.
arXiv Detail & Related papers (2024-11-06T09:23:50Z) - Representation Learning with Multi-Step Inverse Kinematics: An Efficient
and Optimal Approach to Rich-Observation RL [106.82295532402335]
Existing reinforcement learning algorithms suffer from computational intractability, strong statistical assumptions, and suboptimal sample complexity.
We provide the first computationally efficient algorithm that attains rate-optimal sample complexity with respect to the desired accuracy level.
Our algorithm, MusIK, combines systematic exploration with representation learning based on multi-step inverse kinematics.
arXiv Detail & Related papers (2023-04-12T14:51:47Z) - Automatic Discovery of Composite SPMD Partitioning Strategies in PartIR [1.2507285499419876]
We present an automatic partitioner that identifies efficient combinations for many model architectures and accelerator systems.
Our key findings are that a Monte Carlo Tree Search-based partitioner leveraging partition-specific compiler analysis directly into the search and guided goals matches expert-level strategies for various models.
arXiv Detail & Related papers (2022-10-07T17:46:46Z) - Stabilizing Q-learning with Linear Architectures for Provably Efficient
Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation.
We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z) - Automap: Towards Ergonomic Automated Parallelism for ML Models [2.469997094590327]
We present the prototype of an automated partitioner that seamlessly integrates into existing compilers and existing user.
Our partitioner enables SPMD-style parallelism that encompasses data parallelism and parameter/activation sharding.
Through a combination of inductive tactics and search in a platform-independent partitioning IR, automap can recover expert partitioning strategies such as Megatron sharding for transformer layers.
arXiv Detail & Related papers (2021-12-06T12:09:38Z) - DistIR: An Intermediate Representation and Simulator for Efficient
Neural Network Distribution [15.086401550425125]
DistIR is a representation for distributed computation that is tailored for efficient analyses.
We show how DistIR and its simulator enable fast grid searches over complex distribution spaces spanning up to 1000+ configurations.
arXiv Detail & Related papers (2021-11-09T21:32:51Z) - Partner-Assisted Learning for Few-Shot Image Classification [54.66864961784989]
Few-shot Learning has been studied to mimic human visual capabilities and learn effective models without the need of exhaustive human annotation.
In this paper, we focus on the design of training strategy to obtain an elemental representation such that the prototype of each novel class can be estimated from a few labeled samples.
We propose a two-stage training scheme, which first trains a partner encoder to model pair-wise similarities and extract features serving as soft-anchors, and then trains a main encoder by aligning its outputs with soft-anchors while attempting to maximize classification performance.
arXiv Detail & Related papers (2021-09-15T22:46:19Z) - Fast Few-Shot Classification by Few-Iteration Meta-Learning [173.32497326674775]
We introduce a fast optimization-based meta-learning method for few-shot classification.
Our strategy enables important aspects of the base learner objective to be learned during meta-training.
We perform a comprehensive experimental analysis, demonstrating the speed and effectiveness of our approach.
arXiv Detail & Related papers (2020-10-01T15:59:31Z) - Auto-MAP: A DQN Framework for Exploring Distributed Execution Plans for
DNN Workloads [11.646744408920764]
Auto-MAP is a framework for exploring distributed execution plans for workloads.
It can automatically discovering fast parallelization strategies through reinforcement learning on IR level of deep learning models.
Our evaluation shows that Auto-MAP can find the optimal solution in two hours, while achieving better throughput on several NLP and convolution models.
arXiv Detail & Related papers (2020-07-08T12:38:03Z) - Provable Representation Learning for Imitation Learning via Bi-level
Optimization [60.059520774789654]
A common strategy in modern learning systems is to learn a representation that is useful for many tasks.
We study this strategy in the imitation learning setting for Markov decision processes (MDPs) where multiple experts' trajectories are available.
We instantiate this framework for the imitation learning settings of behavior cloning and observation-alone.
arXiv Detail & Related papers (2020-02-24T21:03:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.