vTrain: A Simulation Framework for Evaluating Cost-effective and Compute-optimal Large Language Model Training
- URL: http://arxiv.org/abs/2312.12391v2
- Date: Tue, 10 Sep 2024 04:16:28 GMT
- Title: vTrain: A Simulation Framework for Evaluating Cost-effective and Compute-optimal Large Language Model Training
- Authors: Jehyeon Bang, Yujeong Choi, Myeongwoo Kim, Yongdeok Kim, Minsoo Rhu,
- Abstract summary: This paper presents our profiling-driven simulator called vTrain to determine an efficient and cost-effective training system configuration.
We demonstrate vTrain's practicality through several case studies, e.g., effectively evaluating optimal training parallelization strategies.
- Score: 3.0051215935332505
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: As large language models (LLMs) become widespread in various application domains, a critical challenge the AI community is facing is how to train these large AI models in a cost-effective manner. Existing LLM training plans typically employ a heuristic based parallel training strategy which is based on empirical observations rather than grounded upon a thorough examination of the search space of LLM parallelization. Such limitation renders existing systems to leave significant performance left on the table, wasting millions of dollars worth of training cost. This paper presents our profiling-driven simulator called vTrain, providing AI practitioners a fast yet accurate software framework to determine an efficient and cost-effective LLM training system configuration. We demonstrate vTrain's practicality through several case studies, e.g., effectively evaluating optimal training parallelization strategies that balances training time and its associated training cost, efficient multi-tenant GPU cluster schedulers targeting multiple LLM training jobs, and determining a compute-optimal LLM model architecture given a fixed compute budget.
Related papers
- Throughput-Optimal Scheduling Algorithms for LLM Inference and AI Agents [6.318292471845427]
We develop the queuing fundamentals for large language model (LLM) inference.
We prove that a large class of 'work-conserving' scheduling algorithms can achieve maximum throughput.
arXiv Detail & Related papers (2025-04-10T00:12:12Z) - SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks [110.20297293596005]
Large language model (LLM) agents need to perform multi-turn interactions in real-world tasks.
Existing multi-turn RL algorithms for optimizing LLM agents fail to perform effective credit assignment over multiple turns while leveraging the generalization capabilities of LLMs.
We propose a novel RL algorithm, SWEET-RL, that uses a carefully designed optimization objective to train a critic model with access to additional training-time information.
Our experiments demonstrate that SWEET-RL achieves a 6% absolute improvement in success and win rates on ColBench compared to other state-of-the-art multi-turn RL algorithms.
arXiv Detail & Related papers (2025-03-19T17:55:08Z) - Cost-Optimal Grouped-Query Attention for Long-Context LLMs [64.90662568387683]
Building effective Transformer-based large language models (LLMs) has recently become a research focus.
We compare models with different parameter sizes, context lengths, and attention head configurations in terms of model performance, computational cost, and memory cost.
Our studies show that, when processing sufficiently long sequences, a larger model with fewer attention heads can achieve a lower loss while incurring lower computational and memory costs.
arXiv Detail & Related papers (2025-03-12T17:50:42Z) - eFedLLM: Efficient LLM Inference Based on Federated Learning [1.6179784294541053]
Large Language Models (LLMs) herald a transformative era in artificial intelligence (AI)
This paper introduces an effective approach that enhances the operational efficiency and affordability of LLM inference.
arXiv Detail & Related papers (2024-11-24T22:50:02Z) - A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs [74.35290684163718]
A primary challenge in large language model (LLM) development is their onerous pre-training cost.
This paper explores a promising paradigm to improve LLM pre-training efficiency and quality by leveraging a small language model (SLM)
arXiv Detail & Related papers (2024-10-24T14:31:52Z) - Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate [118.37653302885607]
We present the Modality Integration Rate (MIR), an effective, robust, and generalized metric to indicate the multi-modal pre-training quality of Large Vision Language Models (LVLMs)
MIR is indicative about training data selection, training strategy schedule, and model architecture design to get better pre-training results.
arXiv Detail & Related papers (2024-10-09T17:59:04Z) - Exploring Scaling Laws for Local SGD in Large Language Model Training [4.125418728284004]
We show that local SGD achieves competitive results compared to conventional methods, given equivalent model parameters, datasets, and computational resources.
This demonstrates its viability as an alternative to single large-cluster training.
arXiv Detail & Related papers (2024-09-20T04:02:48Z) - Understanding the Performance and Estimating the Cost of LLM Fine-Tuning [9.751868268608675]
Fine-tuning Large Language Models (LLMs) for specific tasks in a cost-effective manner.
In this paper, we characterize sparse Mixture of Experts (MoE) based LLM fine-tuning to understand their accuracy and runtime performance.
We also develop and validate an analytical model to estimate the cost of LLM fine-tuning on the cloud.
arXiv Detail & Related papers (2024-08-08T16:26:07Z) - Large Language Models for Base Station Siting: Intelligent Deployment based on Prompt or Agent [62.16747639440893]
Large language models (LLMs) and their associated technologies advance, particularly in the realms of prompt engineering and agent engineering.
This approach entails the strategic use of well-crafted prompts to infuse human experience and knowledge into these sophisticated LLMs.
This integration represents the future paradigm of artificial intelligence (AI) as a service and AI for more ease.
arXiv Detail & Related papers (2024-08-07T08:43:32Z) - SoupLM: Model Integration in Large Language and Multi-Modal Models [51.12227693121004]
Training large language models (LLMs) requires significant computing resources.
Existing publicly available LLMs are typically pre-trained on diverse, privately curated datasets spanning various tasks.
arXiv Detail & Related papers (2024-07-11T05:38:15Z) - ReaL: Efficient RLHF Training of Large Language Models with Parameter Reallocation [12.321332446941378]
Reinforcement Learning from Human Feedback (RLHF) is a pivotal technique for empowering large language model (LLM) applications.
We introduce ReaL, a pioneering system for efficient RLHF training.
We evaluate ReaL on the LLaMA models with up to 70 billion parameters and 128 GPUs.
arXiv Detail & Related papers (2024-06-20T08:04:07Z) - One QuantLLM for ALL: Fine-tuning Quantized LLMs Once for Efficient Deployments [43.107261545706415]
Large Language Models (LLMs) have advanced rapidly but face significant memory demands.
Current methods typically require lengthy training to alleviate the performance degradation from quantization loss.
We make an initial attempt to extend the once-for-all framework to large language models.
arXiv Detail & Related papers (2024-05-30T16:05:15Z) - Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration [70.09561665520043]
We propose a novel framework for multi-agent collaboration that introduces Reinforced Advantage feedback (ReAd) for efficient self-refinement of plans.
We provide theoretical analysis by extending advantage-weighted regression in reinforcement learning to multi-agent systems.
Experiments on Over-AI and a difficult variant of RoCoBench show that ReAd surpasses baselines in success rate, and also significantly decreases the interaction steps of agents.
arXiv Detail & Related papers (2024-05-23T08:33:19Z) - In Situ Framework for Coupling Simulation and Machine Learning with
Application to CFD [51.04126395480625]
Recent years have seen many successful applications of machine learning (ML) to facilitate fluid dynamic computations.
As simulations grow, generating new training datasets for traditional offline learning creates I/O and storage bottlenecks.
This work offers a solution by simplifying this coupling and enabling in situ training and inference on heterogeneous clusters.
arXiv Detail & Related papers (2023-06-22T14:07:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.