Related papers: Gradient-Free Structured Pruning with Unlabeled Data

Gradient-Free Structured Pruning with Unlabeled Data

URL: http://arxiv.org/abs/2303.04185v2
Date: Sat, 15 Jul 2023 20:19:22 GMT
Title: Gradient-Free Structured Pruning with Unlabeled Data
Authors: Azade Nova, Hanjun Dai, Dale Schuurmans
Abstract summary: We propose a gradient-free structured pruning framework that uses only unlabeled data. Up to 40% of the original FLOP count can be reduced with less than a 4% accuracy loss across all tasks considered.
Score: 57.999191898036706
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) have achieved great success in solving difficult tasks across many domains, but such success comes with a high computation cost, and inference latency. As developers and third parties customize these models, the need to provide efficient inference has increased. Many efforts have attempted to reduce inference cost through model compression techniques such as pruning and distillation. However, these techniques either require labeled data, or are time-consuming as they require the compressed model to be retrained to regain accuracy. In this paper, we propose a gradient-free structured pruning framework that uses only unlabeled data. An evaluation on the GLUE and SQuAD benchmarks using BERT$_{BASE}$ and DistilBERT illustrates the effectiveness of the proposed approach. By only using the weights of the pre-trained model and unlabeled data, in a matter of a few minutes on a single GPU, up to 40% of the original FLOP count can be reduced with less than a 4% accuracy loss across all tasks considered.

Related papers

Lightweight Dataset Pruning without Full Training via Example Difficulty and Prediction Uncertainty [9.749638953163391]
We introduce a Difficulty and Uncertainty-Aware Lightweight (DUAL) score to identify important samples from the early training stage. We also propose a ratio-adaptive sampling using Beta distribution to address a catastrophic accuracy drop at an extreme pruning.
arXiv Detail & Related papers (2025-02-10T01:18:40Z)
TT-MPD: Test Time Model Pruning and Distillation [3.675015670568961]
Pruning can be an effective method of compressing large pre-trained models for inference speed acceleration. Previous pruning approaches rely on access to the original training dataset for both pruning and subsequent fine-tuning. We propose an efficient pruning method that considers the approximated finetuned accuracy and potential inference latency saving.
arXiv Detail & Related papers (2024-12-10T02:05:13Z)
PUMA: margin-based data pruning [51.12154122266251]
We focus on data pruning, where some training samples are removed based on the distance to the model classification boundary (i.e., margin) We propose PUMA, a new data pruning strategy that computes the margin using DeepFool. We show that PUMA can be used on top of the current state-of-the-art methodology in robustness, and it is able to significantly improve the model performance unlike the existing data pruning strategies.
arXiv Detail & Related papers (2024-05-10T08:02:20Z)
Uncertainty-aware Parameter-Efficient Self-training for Semi-supervised Language Understanding [38.11411155621616]
We study self-training as one of the predominant semi-supervised learning approaches. We present UPET, a novel Uncertainty-aware self-Training framework. We show that UPET achieves a substantial improvement in terms of performance and efficiency.
arXiv Detail & Related papers (2023-10-19T02:18:29Z)
BOOT: Data-free Distillation of Denoising Diffusion Models with Bootstrapping [64.54271680071373]
Diffusion models have demonstrated excellent potential for generating diverse images. Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few. We present a novel technique called BOOT, that overcomes limitations with an efficient data-free distillation algorithm.
arXiv Detail & Related papers (2023-06-08T20:30:55Z)
Minimizing the Accumulated Trajectory Error to Improve Dataset Distillation [151.70234052015948]
We propose a novel approach that encourages the optimization algorithm to seek a flat trajectory. We show that the weights trained on synthetic data are robust against the accumulated errors perturbations with the regularization towards the flat trajectory. Our method, called Flat Trajectory Distillation (FTD), is shown to boost the performance of gradient-matching methods by up to 4.7%.
arXiv Detail & Related papers (2022-11-20T15:49:11Z)
Accelerating Deep Learning with Dynamic Data Pruning [0.0]
Deep learning has become prohibitively costly, requiring access to powerful computing systems to train state-of-the-art networks. Previous work, such as forget scores and GraNd/EL2N scores, identify important samples within a full dataset and pruning the remaining samples, thereby reducing the iterations per epoch. We propose two algorithms, based on reinforcement learning techniques, to dynamically prune samples and achieve even higher accuracy than the random dynamic method.
arXiv Detail & Related papers (2021-11-24T16:47:34Z)
SSSE: Efficiently Erasing Samples from Trained Machine Learning Models [103.43466657962242]
We propose an efficient and effective algorithm, SSSE, for samples erasure. In certain cases SSSE can erase samples almost as well as the optimal, yet impractical, gold standard of training a new model from scratch with only the permitted data.
arXiv Detail & Related papers (2021-07-08T14:17:24Z)
A contextual analysis of multi-layer perceptron models in classifying hand-written digits and letters: limited resources [0.0]
We extensively test an end-to-end vanilla neural network (MLP) approach in pure numpy without any pre-processing or feature extraction done beforehand. We show that basic data mining operations can significantly improve the performance of the models in terms of computational time.
arXiv Detail & Related papers (2021-07-05T04:30:37Z)
The Right Tool for the Job: Matching Model and Instance Complexities [62.95183777679024]
As NLP models become larger, executing a trained model requires significant computational resources incurring monetary and environmental costs. We propose a modification to contextual representation fine-tuning which, during inference, allows for an early (and fast) "exit" We test our proposed modification on five different datasets in two tasks: three text classification datasets and two natural language inference benchmarks.
arXiv Detail & Related papers (2020-04-16T04:28:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.