Related papers: Rethinking Pruning Large Language Models: Benefits and Pitfalls of Reconstruction Error Minimization

Rethinking Pruning Large Language Models: Benefits and Pitfalls of Reconstruction Error Minimization

URL: http://arxiv.org/abs/2406.15524v2
Date: Fri, 11 Oct 2024 01:46:25 GMT
Title: Rethinking Pruning Large Language Models: Benefits and Pitfalls of Reconstruction Error Minimization
Authors: Sungbin Shin, Wonpyo Park, Jaeho Lee, Namhoon Lee,
Abstract summary: We present an array of reconstruction techniques that can significantly reduce this error by more than $90%$. We find out that a strategy of self-generating calibration data can mitigate this trade-off between reconstruction and generalization.
Score: 18.24882084542254
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This work suggests fundamentally rethinking the current practice of pruning large language models (LLMs). The way it is done is by divide and conquer: split the model into submodels, sequentially prune them, and reconstruct predictions of the dense counterparts on small calibration data one at a time; the final model is obtained simply by putting the resulting sparse submodels together. While this approach enables pruning under memory constraints, it generates high reconstruction errors. In this work, we first present an array of reconstruction techniques that can significantly reduce this error by more than $90\%$. Unwittingly, however, we discover that minimizing reconstruction error is not always ideal and can overfit the given calibration data, resulting in rather increased language perplexity and poor performance at downstream tasks. We find out that a strategy of self-generating calibration data can mitigate this trade-off between reconstruction and generalization, suggesting new directions in the presence of both benefits and pitfalls of reconstruction for pruning LLMs.

Related papers

DReSS: Data-driven Regularized Structured Streamlining for Large Language Models [30.47317140878219]
Large language models (LLMs) have achieved significant progress across various domains, but their increasing scale results in high computational and memory costs. We propose a novel paradigm that first applies regularization, then prunes, and finally finetunes. By leveraging a small amount of data to regularize the components to be pruned, DReSS explicitly transfers the important information to the remaining parts of the model in advance.
arXiv Detail & Related papers (2025-01-29T14:28:11Z)
UniRestorer: Universal Image Restoration via Adaptively Estimating Image Degradation at Proper Granularity [79.90839080916913]
We present our UniRestorer with improved restoration performance. Specifically, we perform hierarchical clustering on degradation space, and train a multi-granularity mixture-of-experts (MoE) restoration model. In contrast to existing degradation-agnostic and -aware methods, UniRestorer can leverage degradation estimation to benefit degradation specific restoration.
arXiv Detail & Related papers (2024-12-28T14:09:08Z)
SINDER: Repairing the Singular Defects of DINOv2 [61.98878352956125]
Vision Transformer models trained on large-scale datasets often exhibit artifacts in the patch token they extract. We propose a novel fine-tuning smooth regularization that rectifies structural deficiencies using only a small dataset.
arXiv Detail & Related papers (2024-07-23T20:34:23Z)
Reconstruct the Pruned Model without Any Retraining [23.235907813011174]
We introduce the Linear Interpolation-based Adaptive Reconstruction (LIAR) framework, which is both efficient and effective. LIAR does not require back-propagation or retraining and is compatible with various pruning criteria and modules. Our evaluations on benchmarks such as GLUE, SQuAD, WikiText, and common sense reasoning show that LIAR enables a BERT model to maintain 98% accuracy even after removing 50% of its parameters.
arXiv Detail & Related papers (2024-07-18T09:30:44Z)
Bounding Reconstruction Attack Success of Adversaries Without Data Priors [53.41619942066895]
Reconstruction attacks on machine learning (ML) models pose a strong risk of leakage of sensitive data. In this work, we provide formal upper bounds on reconstruction success under realistic adversarial settings.
arXiv Detail & Related papers (2024-02-20T09:52:30Z)
Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios. We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z)
Probabilistic Modeling for Human Mesh Recovery [73.11532990173441]
This paper focuses on the problem of 3D human reconstruction from 2D evidence. We recast the problem as learning a mapping from the input to a distribution of plausible 3D poses.
arXiv Detail & Related papers (2021-08-26T17:55:11Z)
Divide-and-Assemble: Learning Block-wise Memory for Unsupervised Anomaly Detection [40.778313918994996]
Reconstruction-based methods play an important role in unsupervised anomaly detection in images. In this work, we interpret the reconstruction of an image as a divide-and-assemble procedure. We achieve state-of-the-art performance on the challenging MVTec AD dataset.
arXiv Detail & Related papers (2021-07-28T01:14:32Z)
On the Minimal Error of Empirical Risk Minimization [90.09093901700754]
We study the minimal error of the Empirical Risk Minimization (ERM) procedure in the task of regression. Our sharp lower bounds shed light on the possibility (or impossibility) of adapting to simplicity of the model generating the data.
arXiv Detail & Related papers (2021-02-24T04:47:55Z)
Reconstruction-Based Membership Inference Attacks are Easier on Difficult Problems [36.13835940345486]
We show that models with higher dimensional input and output are more vulnerable to membership inference attacks. We propose using a novel predictability score that can be computed for each sample, and its computation does not require a training set. Our membership error, obtained by subtracting the predictability score from the reconstruction error, is shown to achieve high MIA accuracy on an extensive number of benchmarks.
arXiv Detail & Related papers (2021-02-15T18:57:22Z)
Model Adaptation for Image Reconstruction using Generalized Stein's Unbiased Risk Estimator [34.08815401541628]
We introduce a Generalized Stein's Unbiased Risk Estimate (GSURE) loss metric to adapt the network to the measured k-space data. Unlike current methods that rely on the mean square error in kspace, the proposed metric accounts for noise in the measurements.
arXiv Detail & Related papers (2021-01-29T20:16:45Z)
Generative Tomography Reconstruction [11.460692362624533]
We propose an end-to-end differentiable architecture for tomography reconstruction that maps a noisy sinogram into a denoised reconstruction. We also propose a generative model that, given a noisy sinogram, can sample realistic reconstructions.
arXiv Detail & Related papers (2020-10-26T18:22:37Z)
Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose. We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.