Related papers: CubicML: Automated ML for Large ML Systems Co-design with ML Prediction of Performance

CubicML: Automated ML for Large ML Systems Co-design with ML Prediction of Performance

URL: http://arxiv.org/abs/2409.04585v2
Date: Sat, 21 Sep 2024 05:55:30 GMT
Title: CubicML: Automated ML for Large ML Systems Co-design with ML Prediction of Performance
Authors: Wei Wen, Quanyu Zhu, Weiwei Chu, Wen-Yen Chen, Jiyan Yang,
Abstract summary: Scaling up deep learning models has been proven effective to improve intelligence of machine learning (ML) models. In this paper, we propose CubicML which uses ML to automatically optimize training performance of large distributed ML systems. We prove that CubicML can effectively optimize training speed of in-house recommendation models with 73 billion parameters and large language models up to 405 billion parameters at Meta ads.
Score: 7.425372356516303
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Scaling up deep learning models has been proven effective to improve intelligence of machine learning (ML) models, especially for industry recommendation models and large language models. The co-design of large distributed ML systems and algorithms (to maximize training performance) plays a pivotal role for its success. As it scales, the number of co-design hyper-parameters grows rapidly which brings challenges to feasibly find the optimal setup for system performance maximization. In this paper, we propose CubicML which uses ML to automatically optimize training performance of large distributed ML systems. In CubicML, we use an ML model as a proxy to predict the training performance for search efficiency and performance modeling flexibility. We proved that CubicML can effectively optimize training speed of in-house ads recommendation models with 73 billion parameters and large language models up to 405 billion parameters at Meta.

Related papers

Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs [111.69640966866059]
Sparse large language models (LLMs) with Mixture of Experts (MoE) and close to a trillion parameters are dominating the realm of most capable language models.<n>In this paper, we aim to uncover a recipe to harness such scale on Ascend NPUs.<n>The key goals are better usage of the computing resources under the dynamic sparse model structures and materializing the expected performance gain on the actual hardware.
arXiv Detail & Related papers (2025-05-07T15:46:36Z)
Can LLMs Revolutionize the Design of Explainable and Efficient TinyML Models? [8.953379216683736]
This paper introduces a novel framework for designing efficient neural network architectures specifically tailored to tiny machine learning (TinyML) platforms. By leveraging large language models (LLMs) for neural architecture search (NAS), a vision transformer (ViT)-based knowledge distillation (KD) strategy, and an explainability module, the approach strikes an optimal balance between accuracy, computational efficiency, and memory usage.
arXiv Detail & Related papers (2025-04-13T18:36:03Z)
Lumos: Efficient Performance Modeling and Estimation for Large-scale LLM Training [4.059735204483926]
We propose Lumos, a trace-driven performance modeling and estimation toolkit for large-scale LLM training. We show that Lumos can replay execution time with an average error of just 3.3%, along with other runtime details, across different models and configurations.
arXiv Detail & Related papers (2025-04-12T18:43:24Z)
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization [65.64108848398696]
We introduce a preference optimization process to enhance the multimodal reasoning capabilities of MLLMs. We develop a simple yet effective method, termed Mixed Preference Optimization (MPO), which boosts multimodal CoT performance. Our model, InternVL2-8B-MPO, achieves an accuracy of 67.0 on MathVista, outperforming InternVL2-8B by 8.7 points and achieving performance comparable to the 10x larger InternVL2-76B.
arXiv Detail & Related papers (2024-11-15T18:59:27Z)
Large Language Models for Constructing and Optimizing Machine Learning Workflows: A Survey [3.340984908213717]
Building effective machine learning (ML) to address complex tasks is a primary focus of the Automatic ML (AutoML) community. Recently, the integration of Large Language Models (LLMs) into ML has shown great potential for automating and enhancing various stages of the ML pipeline.
arXiv Detail & Related papers (2024-11-11T21:54:26Z)
Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance [78.48606021719206]
Mini-InternVL is a series of MLLMs with parameters ranging from 1B to 4B, which achieves 90% of the performance with only 5% of the parameters. We develop a unified adaptation framework for Mini-InternVL, which enables our models to transfer and outperform specialized models in downstream tasks.
arXiv Detail & Related papers (2024-10-21T17:58:20Z)
LLaVA-KD: A Framework of Distilling Multimodal Large Language Models [70.19607283302712]
We propose a novel framework to transfer knowledge from l-MLLM to s-MLLM. Specifically, we introduce Multimodal Distillation (MDist) to minimize the divergence between the visual-textual output distributions of l-MLLM and s-MLLM. We also propose a three-stage training scheme to fully exploit the potential of s-MLLM.
arXiv Detail & Related papers (2024-10-21T17:41:28Z)
Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models [90.14693869269519]
MoE LLMs can achieve higher performance with fewer parameters, but it is still hard to deploy them due to their immense parameter sizes. This paper mainly aims to enhance the deployment efficiency of MoE LLMs by introducing plug-and-play expert-level sparsification techniques.
arXiv Detail & Related papers (2024-02-22T18:56:07Z)
CoLLiE: Collaborative Training of Large Language Models in an Efficient Way [59.09824823710863]
CoLLiE is an efficient library that facilitates collaborative training of large language models. With its modular design and comprehensive functionality, CoLLiE offers a balanced blend of efficiency, ease of use, and customization.
arXiv Detail & Related papers (2023-12-01T08:02:16Z)
Retrieval-based Knowledge Transfer: An Effective Approach for Extreme Large Language Model Compression [64.07696663255155]
Large-scale pre-trained language models (LLMs) have demonstrated exceptional performance in various natural language processing (NLP) tasks. However, the massive size of these models poses huge challenges for their deployment in real-world applications. We introduce a novel compression paradigm called Retrieval-based Knowledge Transfer (RetriKT) which effectively transfers the knowledge of LLMs to extremely small-scale models.
arXiv Detail & Related papers (2023-10-24T07:58:20Z)
GEVO-ML: Optimizing Machine Learning Code with Evolutionary Computation [6.525197444717069]
GEVO-ML is a tool for discovering optimization opportunities and tuning the performance of Machine Learning kernels. We demonstrate GEVO-ML on two different ML workloads for both model training and prediction. GEVO-ML finds significant improvements for these models, achieving 90.43% performance improvement when model accuracy is relaxed by 2%.
arXiv Detail & Related papers (2023-10-16T09:24:20Z)
MLGOPerf: An ML Guided Inliner to Optimize Performance [7.314201117946244]
This paper presents the first end-to-end framework capable of optimizing performance using LLVM's ML-Inliner. It employs a secondary ML model to generate rewards used for training a retargeted Reinforcement learning agent. It does so by predicting the post-inlining speedup of a function under analysis and it enables a fast training framework for the primary model.
arXiv Detail & Related papers (2022-07-18T05:47:29Z)
Scalable and Efficient MoE Training for Multitask Multilingual Models [55.987536562357086]
We develop a system capable of scaling MoE models efficiently to trillions of parameters. We also present new training methods to improve MoE sample efficiency and leverage expert pruning strategy to improve time efficiency. A model trained with 10 billion parameters on 50 languages can achieve state-of-the-art performance in Machine Translation (MT) and multilingual natural language generation tasks.
arXiv Detail & Related papers (2021-09-22T00:57:46Z)
Evaluating Deep Learning in SystemML using Layer-wise Adaptive Rate Scaling(LARS) Optimizer [0.3857494091717916]
We apply LARS to a deep learning model implemented using SystemML. We perform experiments with various batch sizes and compare the performance of LARS with distributed machine learning framework.
arXiv Detail & Related papers (2021-02-05T06:23:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.