DreamShard: Generalizable Embedding Table Placement for Recommender
Systems
- URL: http://arxiv.org/abs/2210.02023v1
- Date: Wed, 5 Oct 2022 05:12:02 GMT
- Title: DreamShard: Generalizable Embedding Table Placement for Recommender
Systems
- Authors: Daochen Zha, Louis Feng, Qiaoyu Tan, Zirui Liu, Kwei-Herng Lai,
Bhargav Bhushanam, Yuandong Tian, Arun Kejariwal, Xia Hu
- Abstract summary: We present a reinforcement learning (RL) approach for embedding table placement.
DreamShard achieves the reasoning of operation fusion and generalizability.
Experiments show that DreamShard substantially outperforms the existing human expert and RNN-based strategies.
- Score: 62.444159500899566
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study embedding table placement for distributed recommender systems, which
aims to partition and place the tables on multiple hardware devices (e.g.,
GPUs) to balance the computation and communication costs. Although prior work
has explored learning-based approaches for the device placement of
computational graphs, embedding table placement remains to be a challenging
problem because of 1) the operation fusion of embedding tables, and 2) the
generalizability requirement on unseen placement tasks with different numbers
of tables and/or devices. To this end, we present DreamShard, a reinforcement
learning (RL) approach for embedding table placement. DreamShard achieves the
reasoning of operation fusion and generalizability with 1) a cost network to
directly predict the costs of the fused operation, and 2) a policy network that
is efficiently trained on an estimated Markov decision process (MDP) without
real GPU execution, where the states and the rewards are estimated with the
cost network. Equipped with sum and max representation reductions, the two
networks can directly generalize to any unseen tasks with different numbers of
tables and/or devices without fine-tuning. Extensive experiments show that
DreamShard substantially outperforms the existing human expert and RNN-based
strategies with up to 19% speedup over the strongest baseline on large-scale
synthetic tables and our production tables. The code is available at
https://github.com/daochenzha/dreamshard
Related papers
- GTR: Graph-Table-RAG for Cross-Table Question Answering [53.11230952572134]
We propose the first Graph-Table-RAG framework, namely GTR, which reorganizes table corpora into a heterogeneous graph.
GTR exhibits superior cross-table question-answering performance while maintaining high deployment efficiency, demonstrating its real-world practical applicability.
arXiv Detail & Related papers (2025-04-02T04:24:41Z) - FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency.
We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs)
We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z) - TableRAG: Million-Token Table Understanding with Language Models [53.039560091592215]
TableRAG is a Retrieval-Augmented Generation (RAG) framework specifically designed for LM-based table understanding.
TableRAG leverages query expansion combined with schema and cell retrieval to pinpoint crucial information before providing it to the LMs.
Our results demonstrate that TableRAG achieves the highest retrieval quality, leading to the new state-of-the-art performance on large-scale table understanding.
arXiv Detail & Related papers (2024-10-07T04:15:02Z) - On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and Generalizability [59.72892401927283]
We evaluate the planning capabilities of OpenAI's o1 models across a variety of benchmark tasks.
Our results reveal that o1-preview outperforms GPT-4 in adhering to task constraints.
arXiv Detail & Related papers (2024-09-30T03:58:43Z) - TablePuppet: A Generic Framework for Relational Federated Learning [27.274856376963356]
Current federated learning (FL) approaches view decentralized training data as a single table, divided among participants either horizontally (by rows) or vertically (by columns)
This scenario requires intricate operations like joins and unions to obtain the training data, which is either costly or restricted by privacy concerns.
We propose TablePuppet, a generic framework for RFL that decomposes the learning process into two steps: (1) learning over join (LoJ) followed by (2) learning over union (LoU)
arXiv Detail & Related papers (2024-03-23T13:28:37Z) - Stochastic Configuration Machines: FPGA Implementation [4.57421617811378]
configuration networks (SCNs) are a prime choice in industrial applications due to their merits and feasibility for data modelling.
This paper aims to implement SCM models on a field programmable gate array (FPGA) and introduce binary-coded inputs to improve learning performance.
arXiv Detail & Related papers (2023-10-30T02:04:20Z) - Pre-train and Search: Efficient Embedding Table Sharding with
Pre-trained Neural Cost Models [56.65200574282804]
We propose a "pre-train, and search" paradigm for efficient sharding.
NeuroShard pre-trains neural cost models on augmented tables to cover various sharding scenarios.
NeuroShard significantly and consistently outperforms the state-of-the-art on the benchmark sharding dataset.
arXiv Detail & Related papers (2023-05-03T02:52:03Z) - AutoShard: Automated Embedding Table Sharding for Recommender Systems [54.82606459574231]
We introduce our novel practice in Meta, namely AutoShard, which uses a neural cost model to directly predict the multi-table costs.
AutoShard can efficiently shard hundreds of tables in seconds.
Our algorithms have been deployed in Meta production environment.
arXiv Detail & Related papers (2022-08-12T17:48:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.