Pre-train and Search: Efficient Embedding Table Sharding with
Pre-trained Neural Cost Models
- URL: http://arxiv.org/abs/2305.01868v1
- Date: Wed, 3 May 2023 02:52:03 GMT
- Title: Pre-train and Search: Efficient Embedding Table Sharding with
Pre-trained Neural Cost Models
- Authors: Daochen Zha, Louis Feng, Liang Luo, Bhargav Bhushanam, Zirui Liu,
Yusuo Hu, Jade Nie, Yuzhen Huang, Yuandong Tian, Arun Kejariwal, Xia Hu
- Abstract summary: We propose a "pre-train, and search" paradigm for efficient sharding.
NeuroShard pre-trains neural cost models on augmented tables to cover various sharding scenarios.
NeuroShard significantly and consistently outperforms the state-of-the-art on the benchmark sharding dataset.
- Score: 56.65200574282804
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sharding a large machine learning model across multiple devices to balance
the costs is important in distributed training. This is challenging because
partitioning is NP-hard, and estimating the costs accurately and efficiently is
difficult. In this work, we explore a "pre-train, and search" paradigm for
efficient sharding. The idea is to pre-train a universal and once-for-all
neural network to predict the costs of all the possible shards, which serves as
an efficient sharding simulator. Built upon this pre-trained cost model, we
then perform an online search to identify the best sharding plans given any
specific sharding task. We instantiate this idea in deep learning
recommendation models (DLRMs) and propose NeuroShard for embedding table
sharding. NeuroShard pre-trains neural cost models on augmented tables to cover
various sharding scenarios. Then it identifies the best column-wise and
table-wise sharding plans with beam search and greedy grid search,
respectively. Experiments show that NeuroShard significantly and consistently
outperforms the state-of-the-art on the benchmark sharding dataset, achieving
up to 23.8% improvement. When deployed in an ultra-large production DLRM with
multi-terabyte embedding tables, NeuroShard achieves 11.6% improvement in
embedding costs over the state-of-the-art, which translates to 6.6% end-to-end
training throughput improvement. To facilitate future research of the
"pre-train, and search" paradigm in ML for Systems, we open-source our code at
https://github.com/daochenzha/neuroshard
Related papers
- Data Mixture Optimization: A Multi-fidelity Multi-scale Bayesian Framework [10.317740844867913]
We build a simulator based on 472 language model pre-training runs with varying data compositions from the SlimPajama dataset.
We observe that even simple acquisition functions can enable principled training decisions across training models from 20M to 1B kernels.
arXiv Detail & Related papers (2025-03-26T22:19:47Z) - LMEraser: Large Model Unlearning through Adaptive Prompt Tuning [21.141664917477257]
LMEraser takes a divide-and-conquer strategy with a prompt tuning architecture to isolate data influence.
Experiments demonstrate that LMEraser achieves a $100$-fold reduction in unlearning costs without compromising accuracy.
arXiv Detail & Related papers (2024-04-17T04:08:38Z) - Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping [37.59733248822887]
We train an encoder-decoder Transformer model to predict the search dynamics of the $A*$ search algorithm.
We fine tune this model to obtain a Searchformer, a Transformer model that optimally solves Sokoban puzzles 93.7% of the time.
arXiv Detail & Related papers (2024-02-21T19:17:28Z) - Machine Learning Force Fields with Data Cost Aware Training [94.78998399180519]
Machine learning force fields (MLFF) have been proposed to accelerate molecular dynamics (MD) simulation.
Even for the most data-efficient MLFFs, reaching chemical accuracy can require hundreds of frames of force and energy labels.
We propose a multi-stage computational framework -- ASTEROID, which lowers the data cost of MLFFs by leveraging a combination of cheap inaccurate data and expensive accurate data.
arXiv Detail & Related papers (2023-06-05T04:34:54Z) - DreamShard: Generalizable Embedding Table Placement for Recommender
Systems [62.444159500899566]
We present a reinforcement learning (RL) approach for embedding table placement.
DreamShard achieves the reasoning of operation fusion and generalizability.
Experiments show that DreamShard substantially outperforms the existing human expert and RNN-based strategies.
arXiv Detail & Related papers (2022-10-05T05:12:02Z) - AutoShard: Automated Embedding Table Sharding for Recommender Systems [54.82606459574231]
We introduce our novel practice in Meta, namely AutoShard, which uses a neural cost model to directly predict the multi-table costs.
AutoShard can efficiently shard hundreds of tables in seconds.
Our algorithms have been deployed in Meta production environment.
arXiv Detail & Related papers (2022-08-12T17:48:01Z) - Neural Capacitance: A New Perspective of Neural Network Selection via
Edge Dynamics [85.31710759801705]
Current practice requires expensive computational costs in model training for performance prediction.
We propose a novel framework for neural network selection by analyzing the governing dynamics over synaptic connections (edges) during training.
Our framework is built on the fact that back-propagation during neural network training is equivalent to the dynamical evolution of synaptic connections.
arXiv Detail & Related papers (2022-01-11T20:53:15Z) - Modeling Token-level Uncertainty to Learn Unknown Concepts in SLU via
Calibrated Dirichlet Prior RNN [98.4713940310056]
One major task of spoken language understanding (SLU) in modern personal assistants is to extract semantic concepts from an utterance.
Recent research collected question and answer annotated data to learn what is unknown and should be asked.
We incorporate softmax-based slot filling neural architectures to model the sequence uncertainty without question supervision.
arXiv Detail & Related papers (2020-10-16T02:12:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.