GMP*: Well-Tuned Global Magnitude Pruning Can Outperform Most
BERT-Pruning Methods
- URL: http://arxiv.org/abs/2210.06384v2
- Date: Thu, 13 Oct 2022 06:50:05 GMT
- Title: GMP*: Well-Tuned Global Magnitude Pruning Can Outperform Most
BERT-Pruning Methods
- Authors: Eldar Kurtic and Dan Alistarh
- Abstract summary: We revisit the performance of the classic gradual magnitude pruning (GMP) baseline for large language models.
We show that a simple and general variant, which we call GMP*, can match and sometimes outperform more complex state-of-the-art methods.
- Score: 27.761221746022365
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We revisit the performance of the classic gradual magnitude pruning (GMP)
baseline for large language models, focusing on the classic BERT benchmark on
various popular tasks. Despite existing evidence in the literature that GMP
performs poorly, we show that a simple and general variant, which we call GMP*,
can match and sometimes outperform more complex state-of-the-art methods. Our
results provide a simple yet strong baseline for future work, highlight the
importance of parameter tuning for baselines, and even improve the performance
of the state-of-the-art second-order pruning method in this setting.
Related papers
- Enhancing Graph Self-Supervised Learning with Graph Interplay [8.775644935074407]
Graph Interplay (GIP) is an innovative and versatile approach that significantly enhances the performance equipped with various existing GSSL methods.
GIP advocates introducing direct graph-level communications by random inter-graph edges within standard batches.
Our empirical study demonstrates that GIP surpasses the performance of prevailing GSSL methods by significant margins.
arXiv Detail & Related papers (2024-10-05T07:05:21Z) - Effective Tuning Strategies for Generalist Robot Manipulation Policies [45.36380662552082]
Generalist robot manipulation policies (GMPs) have the potential to generalize across a wide range of tasks, devices, and environments.
While fine-tuning offers a practical way to quickly adapt a GMPs to novel domains and tasks with limited samples, we observe that the performance of the resulting GMPs differs significantly with respect to the design choices of fine-tuning strategies.
arXiv Detail & Related papers (2024-10-02T04:00:25Z) - GLBench: A Comprehensive Benchmark for Graph with Large Language Models [41.89444363336435]
We introduce GLBench, the first comprehensive benchmark for evaluating GraphLLM methods in both supervised and zero-shot scenarios.
GLBench provides a fair and thorough evaluation of different categories of GraphLLM methods, along with traditional baselines such as graph neural networks.
arXiv Detail & Related papers (2024-07-10T08:20:47Z) - RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs [60.38044044203333]
Large language models (LLMs) typically utilize the top-k contexts from a retriever in retrieval-augmented generation (RAG)
We propose a novel instruction fine-tuning framework RankRAG, which instruction-tunes a single LLM for the dual purpose of context ranking and answer generation in RAG.
For generation, we compare our model with many strong baselines, including GPT-4-0613, GPT-4-turbo-2024-0409, and ChatQA-1.5, an open-sourced model with the state-of-the-art performance on RAG benchmarks.
arXiv Detail & Related papers (2024-07-02T17:59:17Z) - Can GPT Redefine Medical Understanding? Evaluating GPT on Biomedical Machine Reading Comprehension [2.3231783764387566]
Large language models (LLMs) have shown remarkable performance on many tasks in different domains.
In this work, we evaluate GPT on four closed-book biomedical machine reading comprehension benchmarks.
We propose a prompting strategy named Implicit Retrieval Augmented Generation (RAG) that alleviates the need for using vector databases.
arXiv Detail & Related papers (2024-05-29T01:12:53Z) - Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models [90.14693869269519]
MoE LLMs can achieve higher performance with fewer parameters, but it is still hard to deploy them due to their immense parameter sizes.
This paper mainly aims to enhance the deployment efficiency of MoE LLMs by introducing plug-and-play expert-level sparsification techniques.
arXiv Detail & Related papers (2024-02-22T18:56:07Z) - How to Prune Your Language Model: Recovering Accuracy on the "Sparsity
May Cry'' Benchmark [60.72725673114168]
We revisit the question of accurate BERT-pruning during fine-tuning on downstream datasets.
We propose a set of general guidelines for successful pruning, even on the challenging SMC benchmark.
arXiv Detail & Related papers (2023-12-21T03:11:30Z) - Tuning Pre-trained Model via Moment Probing [62.445281364055795]
We propose a novel Moment Probing (MP) method to explore the potential of LP.
MP performs a linear classification head based on the mean of final features.
Our MP significantly outperforms LP and is competitive with counterparts at less training cost.
arXiv Detail & Related papers (2023-07-21T04:15:02Z) - Incremental Ensemble Gaussian Processes [53.3291389385672]
We propose an incremental ensemble (IE-) GP framework, where an EGP meta-learner employs an it ensemble of GP learners, each having a unique kernel belonging to a prescribed kernel dictionary.
With each GP expert leveraging the random feature-based approximation to perform online prediction and model update with it scalability, the EGP meta-learner capitalizes on data-adaptive weights to synthesize the per-expert predictions.
The novel IE-GP is generalized to accommodate time-varying functions by modeling structured dynamics at the EGP meta-learner and within each GP learner.
arXiv Detail & Related papers (2021-10-13T15:11:25Z) - Prior Guided Feature Enrichment Network for Few-Shot Segmentation [64.91560451900125]
State-of-the-art semantic segmentation methods require sufficient labeled data to achieve good results.
Few-shot segmentation is proposed to tackle this problem by learning a model that quickly adapts to new classes with a few labeled support samples.
Theses frameworks still face the challenge of generalization ability reduction on unseen classes due to inappropriate use of high-level semantic information.
arXiv Detail & Related papers (2020-08-04T10:41:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.