Related papers: How to Prune Your Language Model: Recovering Accuracy on the "Sparsity May Cry'' Benchmark

How to Prune Your Language Model: Recovering Accuracy on the "Sparsity May Cry'' Benchmark

URL: http://arxiv.org/abs/2312.13547v1
Date: Thu, 21 Dec 2023 03:11:30 GMT
Title: How to Prune Your Language Model: Recovering Accuracy on the "Sparsity May Cry'' Benchmark
Authors: Eldar Kurtic, Torsten Hoefler, Dan Alistarh
Abstract summary: We revisit the question of accurate BERT-pruning during fine-tuning on downstream datasets. We propose a set of general guidelines for successful pruning, even on the challenging SMC benchmark.
Score: 60.72725673114168
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Pruning large language models (LLMs) from the BERT family has emerged as a standard compression benchmark, and several pruning methods have been proposed for this task. The recent ``Sparsity May Cry'' (SMC) benchmark put into question the validity of all existing methods, exhibiting a more complex setup where many known pruning methods appear to fail. We revisit the question of accurate BERT-pruning during fine-tuning on downstream datasets, and propose a set of general guidelines for successful pruning, even on the challenging SMC benchmark. First, we perform a cost-vs-benefits analysis of pruning model components, such as the embeddings and the classification head; second, we provide a simple-yet-general way of scaling training, sparsification and learning rate schedules relative to the desired target sparsity; finally, we investigate the importance of proper parametrization for Knowledge Distillation in the context of LLMs. Our simple insights lead to state-of-the-art results, both on classic BERT-pruning benchmarks, as well as on the SMC benchmark, showing that even classic gradual magnitude pruning (GMP) can yield competitive results, with the right approach.

Related papers

MT-RewardTree: A Comprehensive Framework for Advancing LLM-Based Machine Translation via Reward Modeling [7.980524378201173]
Process reward models (PRMs) have shown success in complex reasoning tasks for large language models (LLMs) However, their application to machine translation (MT) remains underexplored due to the lack of systematic methodologies and evaluation benchmarks. We introduce textbfMT-RewardTree, a comprehensive framework for constructing, evaluating, and deploying process reward models in MT.
arXiv Detail & Related papers (2025-03-15T13:04:51Z)
Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis [89.60263788590893]
Post-training Quantization (PTQ) technique has been extensively adopted for large language models (LLMs) compression. Existing algorithms focus primarily on performance, overlooking the trade-off among model size, performance, and quantization bitwidth. We provide a novel benchmark for LLMs PTQ in this paper.
arXiv Detail & Related papers (2025-02-18T07:35:35Z)
Beyond the Singular: The Essential Role of Multiple Generations in Effective Benchmark Evaluation and Analysis [10.133537818749291]
Large language models (LLMs) have demonstrated significant utilities in real-world applications. Benchmark evaluations are crucial for assessing the capabilities of LLMs.
arXiv Detail & Related papers (2025-02-13T03:43:33Z)
FTP: A Fine-grained Token-wise Pruner for Large Language Models via Token Routing [17.01412432658081]
Large language models (LLMs) have demonstrated superior performance across various tasks by adhering to scaling laws. We propose a fine-grained token-wise pruning approach for the LLMs, which presents a learnable router to adaptively identify the less important tokens. Our approach achieves state-of-the-art (SOTA) pruning results, surpassing other existing pruning methods.
arXiv Detail & Related papers (2024-12-16T07:09:46Z)
Varco Arena: A Tournament Approach to Reference-Free Benchmarking Large Language Models [0.29687381456164]
We propose a more flexible benchmarking approach for Large Language Models (LLMs) Our method, textittextbfVarco Arena, provides reference-free benchmarking of LLMs in tournament style. Our empirical results, supported by simulation experiments, demonstrate that the textittextbfVarco Arena tournament approach aligns better with the current Elo model for benchmarking LLMs.
arXiv Detail & Related papers (2024-11-02T15:23:28Z)
Breaking the Ceiling of the LLM Community by Treating Token Generation as a Classification for Ensembling [3.873482175367558]
In this paper, we treat the Generation of each token by Large Language Model (LLM) as a Classification (GaC) for ensembling. In experiments, we ensemble state-of-the-art LLMs on several benchmarks, including exams, mathematics and reasoning, and observe that our method breaks the existing community performance ceiling.
arXiv Detail & Related papers (2024-06-18T13:17:26Z)
MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures [57.886592207948844]
We propose MixEval, a new paradigm for establishing efficient, gold-standard evaluation by strategically mixing off-the-shelf benchmarks. It bridges (1) comprehensive and well-distributed real-world user queries and (2) efficient and fairly-graded ground-truth-based benchmarks, by matching queries mined from the web with similar queries from existing benchmarks.
arXiv Detail & Related papers (2024-06-03T05:47:05Z)
Rethinking Classifier Re-Training in Long-Tailed Recognition: A Simple Logits Retargeting Approach [102.0769560460338]
We develop a simple logits approach (LORT) without the requirement of prior knowledge of the number of samples per class. Our method achieves state-of-the-art performance on various imbalanced datasets, including CIFAR100-LT, ImageNet-LT, and iNaturalist 2018.
arXiv Detail & Related papers (2024-03-01T03:27:08Z)
Learning Efficient Coding of Natural Images with Maximum Manifold Capacity Representations [4.666056064419346]
The efficient coding hypothesis proposes that the response properties of sensory systems are adapted to the statistics of their inputs. While elegant, information theoretic properties are notoriously difficult to measure in practical settings or to employ as objective functions in optimization. Here we outline the assumptions that allow manifold capacity to be optimized directly, yielding Maximum Manifold Capacity Representations (MMCR)
arXiv Detail & Related papers (2023-03-06T17:26:30Z)
CLIPood: Generalizing CLIP to Out-of-Distributions [73.86353105017076]
Contrastive language-image pre-training (CLIP) models have shown impressive zero-shot ability, but the further adaptation of CLIP on downstream tasks undesirably degrades OOD performances. We propose CLIPood, a fine-tuning method that can adapt CLIP models to OOD situations where both domain shifts and open classes may occur on unseen test data. Experiments on diverse datasets with different OOD scenarios show that CLIPood consistently outperforms existing generalization techniques.
arXiv Detail & Related papers (2023-02-02T04:27:54Z)
GMP*: Well-Tuned Global Magnitude Pruning Can Outperform Most BERT-Pruning Methods [27.761221746022365]
We revisit the performance of the classic gradual magnitude pruning (GMP) baseline for large language models. We show that a simple and general variant, which we call GMP*, can match and sometimes outperform more complex state-of-the-art methods.
arXiv Detail & Related papers (2022-10-12T16:35:47Z)
Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios. We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z)
Mutual-Information Based Few-Shot Classification [34.95314059362982]
We introduce Transductive Infomation Maximization (TIM) for few-shot learning. Our method maximizes the mutual information between the query features and their label predictions for a given few-shot task. We propose a new alternating-direction solver, which speeds up transductive inference over gradient-based optimization.
arXiv Detail & Related papers (2021-06-23T09:17:23Z)
Pre-training Is (Almost) All You Need: An Application to Commonsense Reasoning [61.32992639292889]
Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks. We introduce a new scoring method that casts a plausibility ranking task in a full-text format. We show that our method provides a much more stable training phase across random restarts.
arXiv Detail & Related papers (2020-04-29T10:54:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.