Related papers: Accuracy is Not All You Need

Accuracy is Not All You Need

URL: http://arxiv.org/abs/2407.09141v1
Date: Fri, 12 Jul 2024 10:19:02 GMT
Title: Accuracy is Not All You Need
Authors: Abhinav Dutta, Sanjeev Krishnan, Nipun Kwatra, Ramachandran Ramjee,
Abstract summary: We conduct a detailed study of metrics across multiple compression techniques, models and datasets. We show that the behavior of compressed models as visible to end-users is significantly different from the baseline model, even when accuracy is similar. We propose two such metrics, KL-Divergence and flips, and show that they are well correlated.
Score: 9.371810162601623
License: http://creativecommons.org/licenses/by/4.0/
Abstract: When Large Language Models (LLMs) are compressed using techniques such as quantization, the predominant way to demonstrate the validity of such techniques is by measuring the model's accuracy on various benchmarks.If the accuracies of the baseline model and the compressed model are close, it is assumed that there was negligible degradation in quality.However, even when the accuracy of baseline and compressed model are similar, we observe the phenomenon of flips, wherein answers change from correct to incorrect and vice versa in proportion.We conduct a detailed study of metrics across multiple compression techniques, models and datasets, demonstrating that the behavior of compressed models as visible to end-users is often significantly different from the baseline model, even when accuracy is similar.We further evaluate compressed models qualitatively and quantitatively using MT-Bench and show that compressed models are significantly worse than baseline models in this free-form generative task.Thus, we argue that compression techniques should also be evaluated using distance metrics.We propose two such metrics, KL-Divergence and flips, and show that they are well correlated.

Related papers

Unified Scaling Laws for Compressed Representations [69.72517034565467]
We investigate whether a unified scaling framework can accurately predict model performance when training occurs over various compressed representations.<n>Our main finding is demonstrating both theoretically and empirically that there exists a simple "capacity" metric.<n>We extend our formulation to directly compare the accuracy potential of different compressed formats, and to derive better algorithms for training over sparse-quantized formats.
arXiv Detail & Related papers (2025-06-02T16:52:51Z)
Multi-Level Collaboration in Model Merging [56.31088116526825]
This paper explores the intrinsic connections between model merging and model ensembling. We find that even when previous restrictions are not met, there is still a way for model merging to attain a near-identical and superior performance similar to that of ensembling.
arXiv Detail & Related papers (2025-03-03T07:45:04Z)
Bias in Pruned Vision Models: In-Depth Analysis and Countermeasures [93.17009514112702]
Pruning, setting a significant subset of the parameters of a neural network to zero, is one of the most popular methods of model compression. Despite existing evidence for this phenomenon, the relationship between neural network pruning and induced bias is not well-understood.
arXiv Detail & Related papers (2023-04-25T07:42:06Z)
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time [69.7693300927423]
We show that averaging the weights of multiple models fine-tuned with different hyper parameter configurations improves accuracy and robustness. We show that the model soup approach extends to multiple image classification and natural language processing tasks.
arXiv Detail & Related papers (2022-03-10T17:03:49Z)
A Model for Multi-View Residual Covariances based on Perspective Deformation [88.21738020902411]
We derive a model for the covariance of the visual residuals in multi-view SfM, odometry and SLAM setups. We validate our model with synthetic and real data and integrate it into photometric and feature-based Bundle Adjustment.
arXiv Detail & Related papers (2022-02-01T21:21:56Z)
Can Model Compression Improve NLP Fairness [3.172761915061083]
This is the first paper to examine the effect of distillation and pruning on the toxicity and bias of generative language models. We test Knowledge Distillation and Pruning methods on the GPT2 model and found a consistent pattern of toxicity and bias reduction.
arXiv Detail & Related papers (2022-01-21T05:14:51Z)
Adversarial robustness for latent models: Revisiting the robust-standard accuracies tradeoff [12.386462516398472]
adversarial training is often observed to drop the standard test accuracy. In this paper, we argue that this tradeoff is mitigated when the data enjoys a low-dimensional structure. We show that as the manifold dimension to the ambient dimension decreases, one can obtain models that are nearly optimal with respect to both, the standard accuracy and the robust accuracy measures.
arXiv Detail & Related papers (2021-10-22T17:58:27Z)
What do Compressed Large Language Models Forget? Robustness Challenges in Model Compression [68.82486784654817]
We study two popular model compression techniques including knowledge distillation and pruning. We show that compressed models are significantly less robust than their PLM counterparts on adversarial test sets. We develop a regularization strategy for model compression based on sample uncertainty.
arXiv Detail & Related papers (2021-10-16T00:20:04Z)
Post-mortem on a deep learning contest: a Simpson's paradox and the complementary roles of scale metrics versus shape metrics [61.49826776409194]
We analyze a corpus of models made publicly-available for a contest to predict the generalization accuracy of neural network (NN) models. We identify what amounts to a Simpson's paradox: where "scale" metrics perform well overall but perform poorly on sub partitions of the data. We present two novel shape metrics, one data-independent, and the other data-dependent, which can predict trends in the test accuracy of a series of NNs.
arXiv Detail & Related papers (2021-06-01T19:19:49Z)
Robustness in Compressed Neural Networks for Object Detection [2.9823962001574182]
The sensitivity of compressed models to different distortion types is nuanced. Some of the corruptions are heavily impacted by the compression methods. Data augmentation was confirmed to positively affect models' robustness.
arXiv Detail & Related papers (2021-02-10T15:52:11Z)
Training with Quantization Noise for Extreme Model Compression [57.51832088938618]
We tackle the problem of producing compact models, maximizing their accuracy for a given model size. A standard solution is to train networks with Quantization Aware Training, where the weights are quantized during training and the gradients approximated with the Straight-Through Estimator. In this paper, we extend this approach to work beyond int8 fixed-point quantization with extreme compression methods.
arXiv Detail & Related papers (2020-04-15T20:10:53Z)
Considering discrepancy when calibrating a mechanistic electrophysiology model [41.77362715012383]
Uncertainty quantification (UQ) is a vital step in using mathematical models and simulations to take decisions. In this piece we draw attention to an important and under-addressed source of uncertainty in our predictions -- that of uncertainty in the model structure or the equations themselves.
arXiv Detail & Related papers (2020-01-13T13:26:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.