EBFT: Effective and Block-Wise Fine-Tuning for Sparse LLMs
- URL: http://arxiv.org/abs/2402.12419v1
- Date: Mon, 19 Feb 2024 09:55:32 GMT
- Title: EBFT: Effective and Block-Wise Fine-Tuning for Sparse LLMs
- Authors: Song Guo, Fan Wu, Lei Zhang, Xiawu Zheng, Shengchuan Zhang, Fei Chao,
Yiyu Shi, Rongrong Ji
- Abstract summary: Existing methods for fine-tuning sparse LLMs often suffer from resource-intensive requirements and high retraining costs.
We propose an efficient and fast framework for fine-tuning sparse LLMs based on minimizing reconstruction error.
Our approach involves sampling a small dataset for calibration and utilizing backpropagation to iteratively optimize block-wise reconstruction error.
- Score: 68.41135269685576
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing methods for fine-tuning sparse LLMs often suffer from
resource-intensive requirements and high retraining costs. Additionally, many
fine-tuning methods often rely on approximations or heuristic optimization
strategies, which may lead to suboptimal solutions. To address these issues, we
propose an efficient and fast framework for fine-tuning sparse LLMs based on
minimizing reconstruction error. Our approach involves sampling a small dataset
for calibration and utilizing backpropagation to iteratively optimize
block-wise reconstruction error, on a block-by-block basis, aiming for optimal
solutions. Extensive experiments on various benchmarks consistently demonstrate
the superiority of our method over other baselines. For instance, on the
Wikitext2 dataset with LlamaV1-7B at 70% sparsity, our proposed EBFT achieves a
perplexity of 16.88, surpassing the state-of-the-art DSnoT with a perplexity of
75.14. Moreover, with a structured sparsity ratio of 26\%, EBFT achieves a
perplexity of 16.27, outperforming LoRA (perplexity 16.44). Furthermore, the
fine-tuning process of EBFT for LlamaV1-7B only takes approximately 30 minutes,
and the entire framework can be executed on a single 16GB GPU. The source code
is available at https://github.com/sunggo/EBFT.
Related papers
- Benchmarking LLMs for Optimization Modeling and Enhancing Reasoning via Reverse Socratic Synthesis [60.23133327001978]
Large language models (LLMs) have exhibited their problem-solving ability in mathematical reasoning.
We propose E-OPT, a benchmark for end-to-end optimization problem-solving with human-readable inputs and outputs.
arXiv Detail & Related papers (2024-07-13T13:27:57Z) - Sparsity-Constraint Optimization via Splicing Iteration [1.3622424109977902]
We develop an algorithm named Sparsity-Constraint Optimization via sPlicing itEration (SCOPE)
SCOPE converges effectively without tuning parameters.
We apply SCOPE to solve quadratic optimization, learn sparse classifiers, and recover sparse Markov networks for binary variables.
Our open-source Python package skscope based on C++ implementation is publicly available on GitHub.
arXiv Detail & Related papers (2024-06-17T18:34:51Z) - Optimization-based Structural Pruning for Large Language Models without Back-Propagation [57.9629676017527]
We propose an optimization-based structural pruning on Large-Language Models (LLMs)
Our method learns the pruning masks in a probabilistic space directly by optimizing the loss of the pruned model.
Our method operates for 2.7 hours with around 35GB memory for the 13B models on a single A100 GPU, and our pruned models outperform the state-of-the-arts w.r.t. perplexity.
arXiv Detail & Related papers (2024-06-15T09:31:03Z) - ALPS: Improved Optimization for Highly Sparse One-Shot Pruning for Large Language Models [14.310720048047136]
ALPS is an optimization-based framework that tackles the pruning problem using the operator splitting technique and a preconditioned gradient conjugate-based post-processing step.
Our approach incorporates novel techniques to accelerate and theoretically guarantee convergence while leveraging vectorization and GPU parallelism for efficiency.
On the OPT-30B model with 70% sparsity, ALPS achieves a 13% reduction in test perplexity on the WikiText dataset and a 19% improvement in zero-shot benchmark performance compared to existing methods.
arXiv Detail & Related papers (2024-06-12T02:57:41Z) - Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer [52.09480867526656]
We identify the source of misalignment as a form of distributional shift and uncertainty in learning human preferences.
To mitigate overoptimization, we first propose a theoretical algorithm that chooses the best policy for an adversarially chosen reward model.
Using the equivalence between reward models and the corresponding optimal policy, the algorithm features a simple objective that combines a preference optimization loss and a supervised learning loss.
arXiv Detail & Related papers (2024-05-26T05:38:50Z) - Federated Learning of Large Language Models with Parameter-Efficient
Prompt Tuning and Adaptive Optimization [71.87335804334616]
Federated learning (FL) is a promising paradigm to enable collaborative model training with decentralized data.
The training process of Large Language Models (LLMs) generally incurs the update of significant parameters.
This paper proposes an efficient partial prompt tuning approach to improve performance and efficiency simultaneously.
arXiv Detail & Related papers (2023-10-23T16:37:59Z) - Parallel Bayesian Optimization Using Satisficing Thompson Sampling for
Time-Sensitive Black-Box Optimization [0.0]
We propose satisficing Thompson sampling-based parallel BO approaches, including synchronous and asynchronous versions.
We shift the target from an optimal solution to a satisficing solution that is easier to learn.
The effectiveness of the proposed methods is demonstrated on a fast-charging design problem of Lithium-ion batteries.
arXiv Detail & Related papers (2023-10-19T07:03:51Z) - Learning Optimal Solutions via an LSTM-Optimization Framework [0.0]
We present a deep learning-optimization framework to tackle dynamic mixed-integer programs.
We develop a bidirectional Long Short Term Memory (LSTM) framework that can process information forward and backward in time.
We demonstrate our approach in predicting the optimal decisions for the single-item capacitated lot-sizing problem.
arXiv Detail & Related papers (2022-07-06T19:38:01Z) - Provable Stochastic Optimization for Global Contrastive Learning: Small
Batch Does Not Harm Performance [53.49803579981569]
We consider a global objective for contrastive learning, which contrasts each positive pair with all negative pairs for an anchor point.
Existing methods such as SimCLR requires a large batch size in order to achieve a satisfactory result.
We propose a memory-efficient optimization algorithm for solving the Global Contrastive Learning of Representations, named SogCLR.
arXiv Detail & Related papers (2022-02-24T22:16:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.