Related papers: EE-Tuning: An Economical yet Scalable Solution for Tuning Early-Exit Large Language Models

EE-Tuning: An Economical yet Scalable Solution for Tuning Early-Exit Large Language Models

URL: http://arxiv.org/abs/2402.00518v1
Date: Thu, 1 Feb 2024 11:39:04 GMT
Title: EE-Tuning: An Economical yet Scalable Solution for Tuning Early-Exit Large Language Models
Authors: Xuchen Pan, Yanxi Chen, Yaliang Li, Bolin Ding, Jingren Zhou
Abstract summary: EE-Tuning is a solution to training/tuning early-exit large language models (LLMs) It augments any pre-trained (and possibly fine-tuned) standard LLM with additional early-exit layers that are tuned in a parameter-efficient manner. Our implementation achieves outstanding training efficiency via extensive performance optimizations.
Score: 75.1814102438065
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This work introduces EE-Tuning, a lightweight and economical solution to training/tuning early-exit large language models (LLMs). In contrast to the common approach of full-parameter pre-training, EE-Tuning augments any pre-trained (and possibly fine-tuned) standard LLM with additional early-exit layers that are tuned in a parameter-efficient manner, which requires significantly less computational resources and training data. Our implementation of EE-Tuning achieves outstanding training efficiency via extensive performance optimizations, as well as scalability due to its full compatibility with 3D parallelism. Results of systematic experiments validate the efficacy of EE-Tuning, confirming that effective early-exit LLM inference can be achieved with a limited training budget. In hope of making early-exit LLMs accessible to the community, we release the source code of our implementation of EE-Tuning at https://github.com/pan-x-c/EE-LLM.

Related papers

From Large to Super-Tiny: End-to-End Optimization for Cost-Efficient LLMs [23.253571170594455]
Large Language Models (LLMs) have significantly advanced artificial intelligence. This paper introduces a three-stage cost-efficient end-to-end LLM deployment pipeline. Our approach yields a super tiny model optimized for cost and performance in online systems.
arXiv Detail & Related papers (2025-04-18T05:25:22Z)
Refining Salience-Aware Sparse Fine-Tuning Strategies for Language Models [14.68920095399595]
sparsity-based PEFT (SPEFT) introduces trainable sparse adaptations to the weight matrices in the model. We conduct the first systematic evaluation of salience metrics for SPEFT, inspired by zero-cost NAS proxies. Our work challenges the notion that complexity is necessary for effective PEFT.
arXiv Detail & Related papers (2024-12-18T04:14:35Z)
Early Exit Is a Natural Capability in Transformer-based Models: An Empirical Study on Early Exit without Joint Optimization [39.66431809316171]
The early exit (EE) aims to accelerate auto-regressive decoding. EE generates outputs from intermediate layers instead of using the whole model. Joint optimization must be employed to address challenges by improving the accuracy of locating the optimal EE layer.
arXiv Detail & Related papers (2024-12-02T12:46:34Z)
Understanding Forgetting in LLM Supervised Fine-Tuning and Preference Learning - A Convex Optimization Perspective [55.66517396157806]
The widely adopted approach in post-training popular open-source LLMs is to sequentially perform SFT and RLHF/DPO.<n>This is suboptimal in terms of SFT and RLHF/DPO trade-off.<n>We propose a practical joint post-training framework which has theoretical convergence guarantees and empirically outperforms sequential post-training framework.
arXiv Detail & Related papers (2024-10-20T19:38:41Z)
The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities [0.35998666903987897]
This report examines the fine-tuning of Large Language Models (LLMs) It outlines the historical evolution of LLMs from traditional Natural Language Processing (NLP) models to their pivotal role in AI. The report introduces a structured seven-stage pipeline for fine-tuning LLMs.
arXiv Detail & Related papers (2024-08-23T14:48:02Z)
HFT: Half Fine-Tuning for Large Language Models [42.60438623804577]
Large language models (LLMs) with one or more fine-tuning phases have become a necessary step to unlock various capabilities. In this paper, we find that by regularly resetting partial parameters, LLMs can restore some of the original knowledge. We introduce Half Fine-Tuning (HFT) for LLMs, as a substitute for full fine-tuning (FFT), to mitigate the forgetting issues.
arXiv Detail & Related papers (2024-04-29T07:07:58Z)
Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models [90.14693869269519]
MoE LLMs can achieve higher performance with fewer parameters, but it is still hard to deploy them due to their immense parameter sizes. This paper mainly aims to enhance the deployment efficiency of MoE LLMs by introducing plug-and-play expert-level sparsification techniques.
arXiv Detail & Related papers (2024-02-22T18:56:07Z)
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism [70.07661254213181]
We present EE-LLM, a framework for large-scale training and inference of early-exit large language models (LLMs) Built upon Megatron-LM, EE-LLM implements a variety of algorithmic innovations and performance optimizations tailored to early exiting. Our analytical and empirical study shows that EE-LLM achieves great training efficiency with negligible computational overhead.
arXiv Detail & Related papers (2023-12-08T09:31:50Z)
CoLLiE: Collaborative Training of Large Language Models in an Efficient Way [59.09824823710863]
CoLLiE is an efficient library that facilitates collaborative training of large language models. With its modular design and comprehensive functionality, CoLLiE offers a balanced blend of efficiency, ease of use, and customization.
arXiv Detail & Related papers (2023-12-01T08:02:16Z)
Federated Learning of Large Language Models with Parameter-Efficient Prompt Tuning and Adaptive Optimization [71.87335804334616]
Federated learning (FL) is a promising paradigm to enable collaborative model training with decentralized data. The training process of Large Language Models (LLMs) generally incurs the update of significant parameters. This paper proposes an efficient partial prompt tuning approach to improve performance and efficiency simultaneously.
arXiv Detail & Related papers (2023-10-23T16:37:59Z)
FederatedScope-LLM: A Comprehensive Package for Fine-tuning Large Language Models in Federated Learning [70.38817963253034]
This paper first discusses these challenges of federated fine-tuning LLMs, and introduces our package FS-LLM as a main contribution. We provide comprehensive federated parameter-efficient fine-tuning algorithm implementations and versatile programming interfaces for future extension in FL scenarios. We conduct extensive experiments to validate the effectiveness of FS-LLM and benchmark advanced LLMs with state-of-the-art parameter-efficient fine-tuning algorithms in FL settings.
arXiv Detail & Related papers (2023-09-01T09:40:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.