Augmenting Parameter-Efficient Pre-trained Language Models with Large Language Models
- URL: http://arxiv.org/abs/2602.02501v1
- Date: Mon, 19 Jan 2026 10:25:56 GMT
- Title: Augmenting Parameter-Efficient Pre-trained Language Models with Large Language Models
- Authors: Saurabh Anand, Shubham Malaviya, Manish Shukla, Sachin Lodha,
- Abstract summary: We introduce two strategies that use large language models to enhance the capabilities of pre-trained language models.<n>We empirically demonstrate that by combining parameter-efficient pre-trained models with large language models, we can improve the reliability and robustness of models.
- Score: 6.524460254566904
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Training AI models in cybersecurity with help of vast datasets offers significant opportunities to mimic real-world behaviors effectively. However, challenges like data drift and scarcity of labelled data lead to frequent updates of models and the risk of overfitting. To address these challenges, we used parameter-efficient fine-tuning techniques for pre-trained language models wherein we combine compacters with various layer freezing strategies. To enhance the capabilities of these pre-trained language models, in this work we introduce two strategies that use large language models. In the first strategy, we utilize large language models as data-labelling tools wherein they generate labels for unlabeled data. In the second strategy, large language modes are utilized as fallback mechanisms for predictions having low confidence scores. We perform comprehensive experimental analysis on the proposed strategies on different downstream tasks specific to cybersecurity domain. We empirically demonstrate that by combining parameter-efficient pre-trained models with large language models, we can improve the reliability and robustness of models, making them more suitable for real-world cybersecurity applications.
Related papers
- Evolution without Large Models: Training Language Model with Task Principles [52.44569608690695]
A common training approach for language models involves using a large-scale language model to expand a human-provided dataset.<n>This method significantly reduces training costs by eliminating the need for extensive human data annotation.<n>However, it still faces challenges such as high carbon emissions during data augmentation and the risk of data leakage.
arXiv Detail & Related papers (2025-07-08T13:52:45Z) - Deep Contrastive Unlearning for Language Models [9.36216515987051]
We propose a machine unlearning framework, named Deep Contrastive Unlearning for fine-Tuning (DeepCUT) language models.<n>Our proposed model achieves machine unlearning by directly optimizing the latent space of a model.
arXiv Detail & Related papers (2025-03-19T04:58:45Z) - Tokens for Learning, Tokens for Unlearning: Mitigating Membership Inference Attacks in Large Language Models via Dual-Purpose Training [13.680205342714412]
Large language models (LLMs) have become the backbone of modern natural language processing but pose privacy concerns about leaking sensitive training data.<n>We propose methodname, a lightweight yet effective empirical privacy defense for protecting training data of language models by leveraging token-specific characteristics.
arXiv Detail & Related papers (2025-02-27T03:37:45Z) - EM-MIAs: Enhancing Membership Inference Attacks in Large Language Models through Ensemble Modeling [2.494935495983421]
This paper proposes a novel ensemble attack method that integrates several existing MIAs techniques into an XGBoost-based model to enhance overall attack performance (EM-MIAs)<n> Experimental results demonstrate that the ensemble model significantly improves both AUC-ROC and accuracy compared to individual attack methods across various large language models and datasets.
arXiv Detail & Related papers (2024-12-23T03:47:54Z) - Unlocking the Potential of Model Merging for Low-Resource Languages [66.7716891808697]
Adapting large language models to new languages typically involves continual pre-training (CT) followed by supervised fine-tuning (SFT)
We propose model merging as an alternative for low-resource languages, combining models with distinct capabilities into a single model without additional training.
Experiments based on Llama-2-7B demonstrate that model merging effectively endows LLMs for low-resource languages with task-solving abilities, outperforming CT-then-SFT in scenarios with extremely scarce data.
arXiv Detail & Related papers (2024-07-04T15:14:17Z) - PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind)
Our work offers a theoretical analysis for model design and benchmarks various techniques.
In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z) - A Multi-dimensional Evaluation of Tokenizer-free Multilingual Pretrained
Models [87.7086269902562]
We show that subword-based models might still be the most practical choice in many settings.
We encourage future work in tokenizer-free methods to consider these factors when designing and evaluating new models.
arXiv Detail & Related papers (2022-10-13T15:47:09Z) - On the Usability of Transformers-based models for a French
Question-Answering task [2.44288434255221]
This paper focuses on the usability of Transformer-based language models in small-scale learning problems.
We introduce a new compact model for French FrALBERT which proves to be competitive in low-resource settings.
arXiv Detail & Related papers (2022-07-19T09:46:15Z) - Scaling Language Models: Methods, Analysis & Insights from Training
Gopher [83.98181046650664]
We present an analysis of Transformer-based language model performance across a wide range of model scales.
Gains from scale are largest in areas such as reading comprehension, fact-checking, and the identification of toxic language.
We discuss the application of language models to AI safety and the mitigation of downstream harms.
arXiv Detail & Related papers (2021-12-08T19:41:47Z) - Training Data Leakage Analysis in Language Models [6.843491191969066]
We introduce a methodology that investigates identifying the user content in the training data that could be leaked under a strong and realistic threat model.
We propose two metrics to quantify user-level data leakage by measuring a model's ability to produce unique sentence fragments within training data.
arXiv Detail & Related papers (2021-01-14T00:57:32Z) - InfoBERT: Improving Robustness of Language Models from An Information
Theoretic Perspective [84.78604733927887]
Large-scale language models such as BERT have achieved state-of-the-art performance across a wide range of NLP tasks.
Recent studies show that such BERT-based models are vulnerable facing the threats of textual adversarial attacks.
We propose InfoBERT, a novel learning framework for robust fine-tuning of pre-trained language models.
arXiv Detail & Related papers (2020-10-05T20:49:26Z) - Data Augmentation for Spoken Language Understanding via Pretrained
Language Models [113.56329266325902]
Training of spoken language understanding (SLU) models often faces the problem of data scarcity.
We put forward a data augmentation method using pretrained language models to boost the variability and accuracy of generated utterances.
arXiv Detail & Related papers (2020-04-29T04:07:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.