Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression
- URL: http://arxiv.org/abs/2403.15447v3
- Date: Tue, 4 Jun 2024 05:40:12 GMT
- Title: Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression
- Authors: Junyuan Hong, Jinhao Duan, Chenhui Zhang, Zhangheng Li, Chulin Xie, Kelsey Lieberman, James Diffenderfer, Brian Bartoldson, Ajay Jaiswal, Kaidi Xu, Bhavya Kailkhura, Dan Hendrycks, Dawn Song, Zhangyang Wang, Bo Li,
- Abstract summary: This study conducts the first, thorough evaluation of leading Large Language Models (LLMs)
We find that quantization is currently a more effective approach than pruning in achieving efficiency and trustworthiness simultaneously.
- Score: 109.23761449840222
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Compressing high-capability Large Language Models (LLMs) has emerged as a favored strategy for resource-efficient inferences. While state-of-the-art (SoTA) compression methods boast impressive advancements in preserving benign task performance, the potential risks of compression in terms of safety and trustworthiness have been largely neglected. This study conducts the first, thorough evaluation of three (3) leading LLMs using five (5) SoTA compression techniques across eight (8) trustworthiness dimensions. Our experiments highlight the intricate interplay between compression and trustworthiness, revealing some interesting patterns. We find that quantization is currently a more effective approach than pruning in achieving efficiency and trustworthiness simultaneously. For instance, a 4-bit quantized model retains the trustworthiness of its original counterpart, but model pruning significantly degrades trustworthiness, even at 50% sparsity. Moreover, employing quantization within a moderate bit range could unexpectedly improve certain trustworthiness dimensions such as ethics and fairness. Conversely, extreme quantization to very low bit levels (3 bits) tends to reduce trustworthiness significantly. This increased risk cannot be uncovered by looking at benign performance alone, in turn, mandating comprehensive trustworthiness evaluation in practice. These findings culminate in practical recommendations for simultaneously achieving high utility, efficiency, and trustworthiness in LLMs. Code and models are available at https://decoding-comp-trust.github.io.
Related papers
- PUFFLE: Balancing Privacy, Utility, and Fairness in Federated Learning [2.8304839563562436]
Training and deploying Machine Learning models that simultaneously adhere to principles of fairness and privacy poses a significant challenge.
We introduce PUFFLE, a high-level parameterised approach that can help in the exploration of the balance between utility, privacy, and fairness in FL scenarios.
We prove that PUFFLE can be effective across diverse datasets, models, and data distributions, reducing the model unfairness up to 75%, with a maximum reduction in the utility of 17% in the worst-case scenario.
arXiv Detail & Related papers (2024-07-21T17:22:18Z) - Synchronous Faithfulness Monitoring for Trustworthy Retrieval-Augmented Generation [96.78845113346809]
Retrieval-augmented language models (RALMs) have shown strong performance and wide applicability in knowledge-intensive tasks.
This paper proposes SynCheck, a lightweight monitor that leverages fine-grained decoding dynamics to detect unfaithful sentences.
We also introduce FOD, a faithfulness-oriented decoding algorithm guided by beam search for long-form retrieval-augmented generation.
arXiv Detail & Related papers (2024-06-19T16:42:57Z) - Confidence Under the Hood: An Investigation into the Confidence-Probability Alignment in Large Language Models [14.5291643644017]
We introduce the concept of Confidence-Probability Alignment.
We probe the alignment between models' internal and expressed confidence.
Among the models analyzed, OpenAI's GPT-4 showed the strongest confidence-probability alignment.
arXiv Detail & Related papers (2024-05-25T15:42:04Z) - TernaryVote: Differentially Private, Communication Efficient, and
Byzantine Resilient Distributed Optimization on Heterogeneous Data [50.797729676285876]
We propose TernaryVote, which combines a ternary compressor and the majority vote mechanism to realize differential privacy, gradient compression, and Byzantine resilience simultaneously.
We theoretically quantify the privacy guarantee through the lens of the emerging f-differential privacy (DP) and the Byzantine resilience of the proposed algorithm.
arXiv Detail & Related papers (2024-02-16T16:41:14Z) - TrustLLM: Trustworthiness in Large Language Models [446.5640421311468]
This paper introduces TrustLLM, a comprehensive study of trustworthiness in large language models (LLMs)
We first propose a set of principles for trustworthy LLMs that span eight different dimensions.
Based on these principles, we establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics.
arXiv Detail & Related papers (2024-01-10T22:07:21Z) - U-Trustworthy Models.Reliability, Competence, and Confidence in
Decision-Making [0.21756081703275998]
We present a precise mathematical definition of trustworthiness, termed $mathcalU$-trustworthiness.
Within the context of $mathcalU$-trustworthiness, we prove that properly-ranked models are inherently $mathcalU$-trustworthy.
We advocate for the adoption of the AUC metric as the preferred measure of trustworthiness.
arXiv Detail & Related papers (2024-01-04T04:58:02Z) - Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs [60.61002524947733]
Previous confidence elicitation methods rely on white-box access to internal model information or model fine-tuning.
This leads to a growing need to explore the untapped area of black-box approaches for uncertainty estimation.
We define a systematic framework with three components: prompting strategies for eliciting verbalized confidence, sampling methods for generating multiple responses, and aggregation techniques for computing consistency.
arXiv Detail & Related papers (2023-06-22T17:31:44Z) - Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence
Scores from Language Models Fine-Tuned with Human Feedback [91.22679548111127]
A trustworthy real-world prediction system should produce well-calibrated confidence scores.
We show that verbalized confidences emitted as output tokens are typically better-calibrated than the model's conditional probabilities.
arXiv Detail & Related papers (2023-05-24T10:12:33Z) - Trust, but Verify: Using Self-Supervised Probing to Improve
Trustworthiness [29.320691367586004]
We introduce a new approach of self-supervised probing, which enables us to check and mitigate the overconfidence issue for a trained model.
We provide a simple yet effective framework, which can be flexibly applied to existing trustworthiness-related methods in a plug-and-play manner.
arXiv Detail & Related papers (2023-02-06T08:57:20Z) - Locally Valid and Discriminative Confidence Intervals for Deep Learning
Models [37.57296694423751]
Uncertainty information should be valid (guaranteeing coverage) and discriminative (more uncertain when the expected risk is high)
Most existing Bayesian methods lack frequentist coverage guarantees and usually affect model performance.
We propose Locally Valid and Discriminative confidence intervals (LVD), a simple, efficient and lightweight method to construct discriminative confidence intervals (CIs) for almost any deep learning model.
arXiv Detail & Related papers (2021-06-01T04:39:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.