Related papers: GMP*: Well-Tuned Global Magnitude Pruning Can Outperform Most BERT-Pruning Methods

Related papers

Towards Transformer-Based Aligned Generation with Self-Coherence Guidance [51.42269790543461]
We introduce a training-free approach for enhancing alignment in Transformer-based Text-Guided Diffusion Models (TGDMs) Existing TGDMs often struggle to generate semantically aligned images, particularly when dealing with complex text prompts or multi-concept attribute binding challenges. Our method addresses these challenges by directly optimizing cross-attention maps during the generation process.
arXiv Detail & Related papers (2025-03-22T07:03:57Z)
PGB: One-Shot Pruning for BERT via Weight Grouping and Permutation [5.888489927450056]
This paper proposes a novel semi-structured one-shot pruning method for BERT, called $textitPermutation and Grouping for BERT$ (PGB) PGB identifies important groups of individual weights by permutation and prunes all other weights as a structure in both multi-head attention and feed-forward layers. Our experimental results on BERT$_textBASE$ demonstrate that PGB outperforms the state-of-the-art structured pruning methods in terms of computational cost and accuracy preservation.
arXiv Detail & Related papers (2025-02-06T11:34:41Z)
ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts [71.91042186338163]
ALoRE is a novel PETL method that reuses the hypercomplex parameterized space constructed by Kronecker product to Aggregate Low Rank Experts. Thanks to the artful design, ALoRE maintains negligible extra parameters and can be effortlessly merged into the frozen backbone.
arXiv Detail & Related papers (2024-12-11T12:31:30Z)
Enhancing Graph Self-Supervised Learning with Graph Interplay [8.775644935074407]
Graph Interplay (GIP) is an innovative and versatile approach that significantly enhances the performance equipped with various existing GSSL methods. GIP advocates introducing direct graph-level communications by random inter-graph edges within standard batches. Our empirical study demonstrates that GIP surpasses the performance of prevailing GSSL methods by significant margins.
arXiv Detail & Related papers (2024-10-05T07:05:21Z)
Effective Tuning Strategies for Generalist Robot Manipulation Policies [45.36380662552082]
Generalist robot manipulation policies (GMPs) have the potential to generalize across a wide range of tasks, devices, and environments. While fine-tuning offers a practical way to quickly adapt a GMPs to novel domains and tasks with limited samples, we observe that the performance of the resulting GMPs differs significantly with respect to the design choices of fine-tuning strategies.
arXiv Detail & Related papers (2024-10-02T04:00:25Z)
GLBench: A Comprehensive Benchmark for Graph with Large Language Models [41.89444363336435]
We introduce GLBench, the first comprehensive benchmark for evaluating GraphLLM methods in both supervised and zero-shot scenarios. GLBench provides a fair and thorough evaluation of different categories of GraphLLM methods, along with traditional baselines such as graph neural networks.
arXiv Detail & Related papers (2024-07-10T08:20:47Z)
RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs [60.38044044203333]
Large language models (LLMs) typically utilize the top-k contexts from a retriever in retrieval-augmented generation (RAG) We propose a novel instruction fine-tuning framework RankRAG, which instruction-tunes a single LLM for the dual purpose of context ranking and answer generation in RAG. For generation, we compare our model with many strong baselines, including GPT-4-0613, GPT-4-turbo-2024-0409, and ChatQA-1.5, an open-sourced model with the state-of-the-art performance on RAG benchmarks.
arXiv Detail & Related papers (2024-07-02T17:59:17Z)
Can GPT Redefine Medical Understanding? Evaluating GPT on Biomedical Machine Reading Comprehension [2.3231783764387566]
Large language models (LLMs) have shown remarkable performance on many tasks in different domains. In this work, we evaluate GPT on four closed-book biomedical machine reading comprehension benchmarks. We propose a prompting strategy named Implicit Retrieval Augmented Generation (RAG) that alleviates the need for using vector databases.
arXiv Detail & Related papers (2024-05-29T01:12:53Z)
Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models [90.14693869269519]
MoE LLMs can achieve higher performance with fewer parameters, but it is still hard to deploy them due to their immense parameter sizes. This paper mainly aims to enhance the deployment efficiency of MoE LLMs by introducing plug-and-play expert-level sparsification techniques.
arXiv Detail & Related papers (2024-02-22T18:56:07Z)
How to Prune Your Language Model: Recovering Accuracy on the "Sparsity May Cry'' Benchmark [60.72725673114168]
We revisit the question of accurate BERT-pruning during fine-tuning on downstream datasets. We propose a set of general guidelines for successful pruning, even on the challenging SMC benchmark.
arXiv Detail & Related papers (2023-12-21T03:11:30Z)
Tuning Pre-trained Model via Moment Probing [62.445281364055795]
We propose a novel Moment Probing (MP) method to explore the potential of LP. MP performs a linear classification head based on the mean of final features. Our MP significantly outperforms LP and is competitive with counterparts at less training cost.
arXiv Detail & Related papers (2023-07-21T04:15:02Z)
Incremental Ensemble Gaussian Processes [53.3291389385672]
We propose an incremental ensemble (IE-) GP framework, where an EGP meta-learner employs an it ensemble of GP learners, each having a unique kernel belonging to a prescribed kernel dictionary. With each GP expert leveraging the random feature-based approximation to perform online prediction and model update with it scalability, the EGP meta-learner capitalizes on data-adaptive weights to synthesize the per-expert predictions. The novel IE-GP is generalized to accommodate time-varying functions by modeling structured dynamics at the EGP meta-learner and within each GP learner.
arXiv Detail & Related papers (2021-10-13T15:11:25Z)
Prior Guided Feature Enrichment Network for Few-Shot Segmentation [64.91560451900125]
State-of-the-art semantic segmentation methods require sufficient labeled data to achieve good results. Few-shot segmentation is proposed to tackle this problem by learning a model that quickly adapts to new classes with a few labeled support samples. Theses frameworks still face the challenge of generalization ability reduction on unseen classes due to inappropriate use of high-level semantic information.
arXiv Detail & Related papers (2020-08-04T10:41:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.