Related papers: Lightweight Safety Classification Using Pruned Language Models

Lightweight Safety Classification Using Pruned Language Models

URL: http://arxiv.org/abs/2412.13435v1
Date: Wed, 18 Dec 2024 02:13:13 GMT
Title: Lightweight Safety Classification Using Pruned Language Models
Authors: Mason Sawtell, Tula Masterman, Sandi Besen, Jim Brown,
Abstract summary: We introduce a novel technique for content safety and prompt injection classification for Large Language Models.<n>Our approach delivers superior performance surpassing GPT-4o and special-purpose models fine-tuned for each task.<n>Our results indicate that a single general-purpose LLM can be used to classify content safety, detect prompt injections, and simultaneously generate output tokens.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we introduce a novel technique for content safety and prompt injection classification for Large Language Models. Our technique, Layer Enhanced Classification (LEC), trains a Penalized Logistic Regression (PLR) classifier on the hidden state of an LLM's optimal intermediate transformer layer. By combining the computational efficiency of a streamlined PLR classifier with the sophisticated language understanding of an LLM, our approach delivers superior performance surpassing GPT-4o and special-purpose models fine-tuned for each task. We find that small general-purpose models (Qwen 2.5 sizes 0.5B, 1.5B, and 3B) and other transformer-based architectures like DeBERTa v3 are robust feature extractors allowing simple classifiers to be effectively trained on fewer than 100 high-quality examples. Importantly, the intermediate transformer layers of these models typically outperform the final layer across both classification tasks. Our results indicate that a single general-purpose LLM can be used to classify content safety, detect prompt injections, and simultaneously generate output tokens. Alternatively, these relatively small LLMs can be pruned to the optimal intermediate layer and used exclusively as robust feature extractors. Since our results are consistent on different transformer architectures, we infer that robust feature extraction is an inherent capability of most, if not all, LLMs.

Related papers

DFPE: A Diverse Fingerprint Ensemble for Enhancing LLM Performance [11.753349115726952]
We propose a novel ensemble method - Diverse Fingerprint Ensemble (DFPE) Our approach involves: (1) clustering models based on response "fingerprints" patterns, (2) applying a quantile-based filtering mechanism, and (3) assigning adaptive weights to remaining models. In experiments on the Massive Multitask Language Understanding (MMLU) benchmark, DFPE outperforms the best single model by 3% overall accuracy and 5% in discipline-level accuracy.
arXiv Detail & Related papers (2025-01-29T08:44:45Z)
How to Make LLMs Strong Node Classifiers? [70.14063765424012]
Language Models (LMs) are challenging the dominance of domain-specific models, such as Graph Neural Networks (GNNs) and Graph Transformers (GTs) We propose a novel approach that empowers off-the-shelf LMs to achieve performance comparable to state-of-the-art (SOTA) GNNs on node classification tasks.
arXiv Detail & Related papers (2024-10-03T08:27:54Z)
TransformerRanker: A Tool for Efficiently Finding the Best-Suited Language Models for Downstream Classification Tasks [2.497666465251894]
TransformerRanker is a lightweight library that ranks pre-trained language models for classification tasks. Our library implements current approaches for transferability estimation. We make TransformerRanker available as a pip-installable open-source library.
arXiv Detail & Related papers (2024-09-09T18:47:00Z)
Bypass Back-propagation: Optimization-based Structural Pruning for Large Language Models via Policy Gradient [57.9629676017527]
We propose an optimization-based structural pruning on Large-Language Models. We learn the pruning masks in a probabilistic space directly by optimizing the loss of the pruned model. Our method operates for 2.7 hours with around 35GB memory for the 13B models on a single A100 GPU.
arXiv Detail & Related papers (2024-06-15T09:31:03Z)
One Token Can Help! Learning Scalable and Pluggable Virtual Tokens for Retrieval-Augmented Large Language Models [67.49462724595445]
Retrieval-augmented generation (RAG) is a promising way to improve large language models (LLMs) We propose a novel method that involves learning scalable and pluggable virtual tokens for RAG.
arXiv Detail & Related papers (2024-05-30T03:44:54Z)
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders [34.421335513040795]
Large decoder-only language models (LLMs) are the state-of-the-art models on most of today's NLP tasks and benchmarks. We introduce LLM2Vec, a simple unsupervised approach that can transform any decoder-only LLM into a strong text encoder.
arXiv Detail & Related papers (2024-04-09T02:51:05Z)
LLM-augmented Preference Learning from Natural Language [19.700169351688768]
Large Language Models (LLMs) are equipped to deal with larger context lengths. LLMs can consistently outperform the SotA when the target text is large. Few-shot learning yields better performance than zero-shot learning.
arXiv Detail & Related papers (2023-10-12T17:17:27Z)
Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large Language Models for Dynamic Inference [32.62084449979531]
We extend SortedNet to generative NLP tasks by replacing Standard Fine-Tuning (SFT) with Sorted Fine-Tuning (SoFT) Our approach boosts model efficiency, eliminating the need for multiple models for various scenarios during inference. Our results show the superior performance of sub-models in comparison to Standard Fine-Tuning and SFT+ICT (Early-Exit)
arXiv Detail & Related papers (2023-09-16T11:58:34Z)
Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection [88.23337313766353]
This work first provides a comprehensive statistical theory for transformers to perform ICL. We show that transformers can implement a broad class of standard machine learning algorithms in context. A emphsingle transformer can adaptively select different base ICL algorithms.
arXiv Detail & Related papers (2023-06-07T17:59:31Z)
LLM-Pruner: On the Structural Pruning of Large Language Models [65.02607075556742]
Large language models (LLMs) have shown remarkable capabilities in language understanding and generation. We tackle the compression of LLMs within the bound of two constraints: being task-agnostic and minimizing the reliance on the original training dataset. Our method, named LLM-Pruner, adopts structural pruning that selectively removes non-critical coupled structures.
arXiv Detail & Related papers (2023-05-19T12:10:53Z)
CHAPTER: Exploiting Convolutional Neural Network Adapters for Self-supervised Speech Models [62.60723685118747]
Self-supervised learning (SSL) is a powerful technique for learning representations from unlabeled data. We propose an efficient tuning method specifically designed for SSL speech model, by applying CNN adapters at the feature extractor. We empirically found that adding CNN to the feature extractor can help the adaptation on emotion and speaker tasks.
arXiv Detail & Related papers (2022-12-01T08:50:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.