Fugu-MT 論文翻訳(概要): Pre-train and Search: Efficient Embedding Table Sharding with Pre-trained Neural Cost Models

論文の概要: Pre-train and Search: Efficient Embedding Table Sharding with Pre-trained Neural Cost Models

arxiv url: http://arxiv.org/abs/2305.01868v1
Date: Wed, 3 May 2023 02:52:03 GMT
ステータス: 翻訳完了
システム内更新日: 2023-05-04 16:11:04.575242
Title: Pre-train and Search: Efficient Embedding Table Sharding with Pre-trained Neural Cost Models
Title（参考訳）: pre-train and search: pre-trained neural cost modelを用いた効率的な埋め込みテーブルシャーディング
Authors: Daochen Zha, Louis Feng, Liang Luo, Bhargav Bhushanam, Zirui Liu, Yusuo Hu, Jade Nie, Yuzhen Huang, Yuandong Tian, Arun Kejariwal, Xia Hu
Abstract要約: 効率的なシャーディングのための「事前訓練・探索」パラダイムを提案する。 NeuroShardは、さまざまなシャーディングシナリオをカバーするために、拡張テーブル上のニューラルコストモデルをトレーニングする。 NeuroShardは、ベンチマークシャーディングデータセットの最先端を著しく、一貫して上回る。
参考スコア（独自算出の注目度）: 56.65200574282804
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Sharding a large machine learning model across multiple devices to balance the costs is important in distributed training. This is challenging because partitioning is NP-hard, and estimating the costs accurately and efficiently is difficult. In this work, we explore a "pre-train, and search" paradigm for efficient sharding. The idea is to pre-train a universal and once-for-all neural network to predict the costs of all the possible shards, which serves as an efficient sharding simulator. Built upon this pre-trained cost model, we then perform an online search to identify the best sharding plans given any specific sharding task. We instantiate this idea in deep learning recommendation models (DLRMs) and propose NeuroShard for embedding table sharding. NeuroShard pre-trains neural cost models on augmented tables to cover various sharding scenarios. Then it identifies the best column-wise and table-wise sharding plans with beam search and greedy grid search, respectively. Experiments show that NeuroShard significantly and consistently outperforms the state-of-the-art on the benchmark sharding dataset, achieving up to 23.8% improvement. When deployed in an ultra-large production DLRM with multi-terabyte embedding tables, NeuroShard achieves 11.6% improvement in embedding costs over the state-of-the-art, which translates to 6.6% end-to-end training throughput improvement. To facilitate future research of the "pre-train, and search" paradigm in ML for Systems, we open-source our code at https://github.com/daochenzha/neuroshard
Abstract（参考訳）: 大規模な機械学習モデルを複数のデバイスに分散して、コストのバランスをとることは、分散トレーニングにおいて重要である。パーティショニングはnpハードであり、コストを正確にかつ効率的に見積もるのは困難である。本研究では,効率的なシャーディングのための"事前訓練と探索"のパラダイムを検討する。そのアイデアは、すべてのシャードのコストを予測するために、普遍的で一度限りのニューラルネットワークを事前訓練することであり、効率的なシャーディングシミュレータとして機能する。この事前訓練されたコストモデルに基づいてオンライン検索を行い、特定のシャーディングタスクによって最適なシャーディング計画を特定する。我々は、このアイデアをディープラーニングレコメンデーションモデル(DLRM)でインスタンス化し、テーブルシャーディングを埋め込むためのNeuroShardを提案する。 NeuroShardは、さまざまなシャーディングシナリオをカバーするために、拡張テーブル上のニューラルコストモデルをトレーニングする。次に,ビームサーチとグリーディグリッドサーチを用いて,最善の列分割計画とテーブル分割計画を特定する。実験の結果、NeuroShardはベンチマークシャーディングデータセットの最先端性を大幅に向上し、最大23.8%の改善を達成した。マルチテラバイトの埋め込みテーブルを備えた超大型のDLRMにデプロイすると、NeuroShardは11.6%の組込みコスト向上を実現し、エンドツーエンドのトレーニングスループットが6.6%向上した。 ML for Systemsの"pre-train, and search"パラダイムの今後の研究を促進するために、私たちはコードをhttps://github.com/daochenzha/neuroshardでオープンソース化しました。

論文の概要: Pre-train and Search: Efficient Embedding Table Sharding with Pre-trained Neural Cost Models

関連論文リスト