G-Meta: Distributed Meta Learning in GPU Clusters for Large-Scale
Recommender Systems
- URL: http://arxiv.org/abs/2401.04338v1
- Date: Tue, 9 Jan 2024 03:35:43 GMT
- Title: G-Meta: Distributed Meta Learning in GPU Clusters for Large-Scale
Recommender Systems
- Authors: Youshao Xiao, Shangchun Zhao, Zhenglei Zhou, Zhaoxin Huan, Lin Ju,
Xiaolu Zhang, Lin Wang, Jun Zhou
- Abstract summary: This paper provides a framework for large-scale training for optimization-based Meta DLRM models over the textbfGPU cluster.
Various experimental results show that G-Meta achieves notable training speed without loss of statistical performance.
- Score: 16.343248795178685
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, a new paradigm, meta learning, has been widely applied to Deep
Learning Recommendation Models (DLRM) and significantly improves statistical
performance, especially in cold-start scenarios. However, the existing systems
are not tailored for meta learning based DLRM models and have critical problems
regarding efficiency in distributed training in the GPU cluster. It is because
the conventional deep learning pipeline is not optimized for two task-specific
datasets and two update loops in meta learning. This paper provides a
high-performance framework for large-scale training for Optimization-based Meta
DLRM models over the \textbf{G}PU cluster, namely \textbf{G}-Meta. Firstly,
G-Meta utilizes both data parallelism and model parallelism with careful
orchestration regarding computation and communication efficiency, to enable
high-speed distributed training. Secondly, it proposes a Meta-IO pipeline for
efficient data ingestion to alleviate the I/O bottleneck. Various experimental
results show that G-Meta achieves notable training speed without loss of
statistical performance. Since early 2022, G-Meta has been deployed in Alipay's
core advertising and recommender system, shrinking the continuous delivery of
models by four times. It also obtains 6.48\% improvement in Conversion Rate
(CVR) and 1.06\% increase in CPM (Cost Per Mille) in Alipay's homepage display
advertising, with the benefit of larger training samples and tasks.
Related papers
- FREE: Faster and Better Data-Free Meta-Learning [77.90126669914324]
Data-Free Meta-Learning (DFML) aims to extract knowledge from a collection of pre-trained models without requiring the original data.
We introduce the Faster and Better Data-Free Meta-Learning framework, which contains: (i) a meta-generator for rapidly recovering training tasks from pre-trained models; and (ii) a meta-learner for generalizing to new unseen tasks.
arXiv Detail & Related papers (2024-05-02T03:43:19Z) - Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning [55.96599486604344]
We introduce an approach aimed at enhancing the reasoning capabilities of Large Language Models (LLMs) through an iterative preference learning process.
We use Monte Carlo Tree Search (MCTS) to iteratively collect preference data, utilizing its look-ahead ability to break down instance-level rewards into more granular step-level signals.
The proposed algorithm employs Direct Preference Optimization (DPO) to update the LLM policy using this newly generated step-level preference data.
arXiv Detail & Related papers (2024-05-01T11:10:24Z) - Towards Effective and General Graph Unlearning via Mutual Evolution [44.11777886421429]
We propose MEGU, a new mutual evolution paradigm that simultaneously evolves the predictive and unlearning capacities of graph unlearning.
In experiments on 9 graph benchmark datasets, MEGU achieves average performance improvements of 2.7%, 2.5%, and 3.2%.
MEGU exhibits satisfactory training efficiency, reducing time and space overhead by an average of 159.8x and 9.6x, respectively, in comparison to retraining GNN from scratch.
arXiv Detail & Related papers (2024-01-22T08:45:29Z) - When Parameter-efficient Tuning Meets General-purpose Vision-language
Models [65.19127815275307]
PETAL revolutionizes the training process by requiring only 0.5% of the total parameters, achieved through a unique mode approximation technique.
Our experiments reveal that PETAL not only outperforms current state-of-the-art methods in most scenarios but also surpasses full fine-tuning models in effectiveness.
arXiv Detail & Related papers (2023-12-16T17:13:08Z) - Making Scalable Meta Learning Practical [40.24886572503001]
meta learning has long been recognized to suffer from poor scalability due to its tremendous compute/memory costs, training instability, and a lack of efficient distributed training support.
In this work, we focus on making scalable meta learning practical by introducing SAMA, which combines advances in both implicit differentiation algorithms and systems.
We show that SAMA-based data optimization leads to consistent improvements in text classification accuracy with BERT and RoBERTa large language models, and achieves state-of-the-art results in both small- and large-scale data pruning on image classification tasks.
arXiv Detail & Related papers (2023-10-09T12:45:13Z) - p-Meta: Towards On-device Deep Model Adaptation [30.27192953408665]
p-Meta is a new meta learning method that enforces structure-wise partial parameter updates while ensuring fast generalization to unseen tasks.
We show that p-Meta substantially reduces the peak dynamic memory by a factor of 2.5 on average compared to state-of-the-art few-shot adaptation methods.
arXiv Detail & Related papers (2022-06-25T18:36:59Z) - DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language
Models [152.29364079385635]
As pre-trained models grow bigger, the fine-tuning process can be time-consuming and computationally expensive.
We propose a framework for resource- and parameter-efficient fine-tuning by leveraging the sparsity prior in both weight updates and the final model weights.
Our proposed framework, dubbed Dually Sparsity-Embedded Efficient Tuning (DSEE), aims to achieve two key objectives: (i) parameter efficient fine-tuning and (ii) resource-efficient inference.
arXiv Detail & Related papers (2021-10-30T03:29:47Z) - MetaTune: Meta-Learning Based Cost Model for Fast and Efficient
Auto-tuning Frameworks [0.0]
This paper proposes MetaTune, a meta-learning based cost model that more quickly and accurately predicts the performance of optimized codes with pre-trained model parameters.
The framework provides 8 to 13% better inference time on average for four CNN models with comparable or lower optimization time while outperforming transfer learning by 10% in cross-platform cases.
arXiv Detail & Related papers (2021-02-08T13:59:08Z) - MetaDistiller: Network Self-Boosting via Meta-Learned Top-Down
Distillation [153.56211546576978]
In this work, we propose that better soft targets with higher compatibil-ity can be generated by using a label generator.
We can employ the meta-learning technique to optimize this label generator.
The experiments are conducted on two standard classificationbenchmarks, namely CIFAR-100 and ILSVRC2012.
arXiv Detail & Related papers (2020-08-27T13:04:27Z) - Scaling Distributed Deep Learning Workloads beyond the Memory Capacity
with KARMA [58.040931661693925]
We propose a strategy that combines redundant recomputing and out-of-core methods.
We achieve an average of 1.52x speedup in six different models over the state-of-the-art out-of-core methods.
Our data parallel out-of-core solution can outperform complex hybrid model parallelism in training large models, e.g. Megatron-LM and Turning-NLG.
arXiv Detail & Related papers (2020-08-26T07:24:34Z) - Generalized Reinforcement Meta Learning for Few-Shot Optimization [3.7675996866306845]
We present a generic and flexible Reinforcement Learning (RL) based meta-learning framework for the problem of few-shot learning.
Our framework could be easily extended to do network architecture search.
arXiv Detail & Related papers (2020-05-04T03:21:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.