No-Skim: Towards Efficiency Robustness Evaluation on Skimming-based
Language Models
- URL: http://arxiv.org/abs/2312.09494v2
- Date: Mon, 18 Dec 2023 02:50:02 GMT
- Title: No-Skim: Towards Efficiency Robustness Evaluation on Skimming-based
Language Models
- Authors: Shengyao Zhang, Mi Zhang, Xudong Pan, Min Yang
- Abstract summary: We propose No-Skim to help the owners of skimming-based LLM to understand and measure the robustness of their acceleration scheme.
Specifically, our framework searches minimal and unnoticeable perturbations at character-level and token-level to generate adversarial inputs that sufficiently increase the remaining token ratio.
In the worst case, the perturbation found by No-Skim substantially increases the running cost of LLM by over 145% on average.
- Score: 27.469321590884903
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To reduce the computation cost and the energy consumption in large language
models (LLM), skimming-based acceleration dynamically drops unimportant tokens
of the input sequence progressively along layers of the LLM while preserving
the tokens of semantic importance. However, our work for the first time reveals
the acceleration may be vulnerable to Denial-of-Service (DoS) attacks. In this
paper, we propose No-Skim, a general framework to help the owners of
skimming-based LLM to understand and measure the robustness of their
acceleration scheme. Specifically, our framework searches minimal and
unnoticeable perturbations at character-level and token-level to generate
adversarial inputs that sufficiently increase the remaining token ratio, thus
increasing the computation cost and energy consumption. We systematically
evaluate the vulnerability of the skimming acceleration in various LLM
architectures including BERT and RoBERTa on the GLUE benchmark. In the worst
case, the perturbation found by No-Skim substantially increases the running
cost of LLM by over 145% on average. Moreover, No-Skim extends the evaluation
framework to various scenarios, making the evaluation conductible with
different level of knowledge.
Related papers
- DSMoE: Matrix-Partitioned Experts with Dynamic Routing for Computation-Efficient Dense LLMs [70.91804882618243]
This paper proposes DSMoE, a novel approach that achieves sparsification by partitioning pre-trained FFN layers into computational blocks.
We implement adaptive expert routing using sigmoid activation and straight-through estimators, enabling tokens to flexibly access different aspects of model knowledge.
Experiments on LLaMA models demonstrate that under equivalent computational constraints, DSMoE achieves superior performance compared to existing pruning and MoE approaches.
arXiv Detail & Related papers (2025-02-18T02:37:26Z) - FLAME: Flexible LLM-Assisted Moderation Engine [2.966082563853265]
We introduce Flexible LLM-Assisted Moderation Engine (FLAME)
Unlike traditional circuit-breaking methods that analyze user queries, FLAME evaluates model responses.
Our experiments demonstrate that FLAME significantly outperforms current moderation systems.
arXiv Detail & Related papers (2025-02-13T11:05:55Z) - Adversarial Reasoning at Jailbreaking Time [49.70772424278124]
We develop an adversarial reasoning approach to automatic jailbreaking via test-time computation.
Our approach introduces a new paradigm in understanding LLM vulnerabilities, laying the foundation for the development of more robust and trustworthy AI systems.
arXiv Detail & Related papers (2025-02-03T18:59:01Z) - LeMo: Enabling LEss Token Involvement for MOre Context Fine-tuning [38.35238373706948]
LeMo is a new LLM fine-tuning system that exploits a new token-level sparsity mechanism inherent in long-context scenarios.
LeMo reduces memory consumption by up to 1.93x and achieves up to 1.36x speedups, outperforming state-of-the-art fine-tuning systems.
arXiv Detail & Related papers (2025-01-15T05:17:12Z) - A Soft Sensor Method with Uncertainty-Awareness and Self-Explanation Based on Large Language Models Enhanced by Domain Knowledge Retrieval [17.605817344542345]
We propose a framework called Few-shot Uncertainty-aware and self-Explaining Soft Sensor (LLM-FUESS)
LLM-FUESS includes the Zero-shot Auxiliary Variable Selector (LLM-ZAVS) and the Uncertainty-aware Few-shot Soft Sensor (LLM-UFSS)
Our method achieved state-of-the-art predictive performance, strong robustness, and flexibility, effectively mitigates training instability found in traditional methods.
arXiv Detail & Related papers (2025-01-06T11:43:29Z) - Adaptive Pruning for Large Language Models with Structural Importance Awareness [66.2690963378878]
Large language models (LLMs) have significantly improved language understanding and generation capabilities.
LLMs are difficult to deploy on resource-constrained edge devices due to their high computational and storage resource demands.
We propose structurally-aware adaptive pruning (SAAP) to significantly reduce the computational and memory costs while maintaining model performance.
arXiv Detail & Related papers (2024-12-19T18:08:04Z) - Attribute Controlled Fine-tuning for Large Language Models: A Case Study on Detoxification [76.14641982122696]
We propose a constraint learning schema for fine-tuning Large Language Models (LLMs) with attribute control.
We show that our approach leads to an LLM that produces fewer inappropriate responses while achieving competitive performance on benchmarks and a toxicity detection task.
arXiv Detail & Related papers (2024-10-07T23:38:58Z) - FFN-SkipLLM: A Hidden Gem for Autoregressive Decoding with Adaptive Feed Forward Skipping [49.66872823080736]
Autoregressive Large Language Models (e.g., LLaMa, GPTs) are omnipresent achieving remarkable success in language understanding and generation.
To mitigate overload incurred during generation, several early-exit and layer-dropping strategies have been proposed.
We propose FFN-SkipLLM, which is an input-adaptive feed-forward skipping strategy.
arXiv Detail & Related papers (2024-04-05T02:35:43Z) - RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content [62.685566387625975]
Current mitigation strategies, while effective, are not resilient under adversarial attacks.
This paper introduces Resilient Guardrails for Large Language Models (RigorLLM), a novel framework designed to efficiently moderate harmful and unsafe inputs.
arXiv Detail & Related papers (2024-03-19T07:25:02Z) - DA-LSTM: A Dynamic Drift-Adaptive Learning Framework for Interval Load
Forecasting with LSTM Networks [1.3342521220589318]
A drift magnitude threshold should be defined to design change detection methods to identify drifts.
We propose a dynamic drift-adaptive Long Short-Term Memory (DA-LSTM) framework that can improve the performance of load forecasting models.
arXiv Detail & Related papers (2023-05-15T16:26:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.