SentinelLMs: Encrypted Input Adaptation and Fine-tuning of Language
Models for Private and Secure Inference
- URL: http://arxiv.org/abs/2312.17342v1
- Date: Thu, 28 Dec 2023 19:55:11 GMT
- Title: SentinelLMs: Encrypted Input Adaptation and Fine-tuning of Language
Models for Private and Secure Inference
- Authors: Abhijit Mishra, Mingda Li, Soham Deo
- Abstract summary: This paper addresses the privacy and security concerns associated with deep neural language models.
Deep neural language models serve as crucial components in various modern AI-based applications.
We propose a novel method to adapt and fine-tune transformer-based language models on passkey-encrypted user-specific text.
- Score: 6.0189674528771
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This paper addresses the privacy and security concerns associated with deep
neural language models, which serve as crucial components in various modern
AI-based applications. These models are often used after being pre-trained and
fine-tuned for specific tasks, with deployment on servers accessed through the
internet. However, this introduces two fundamental risks: (a) the transmission
of user inputs to the server via the network gives rise to interception
vulnerabilities, and (b) privacy concerns emerge as organizations that deploy
such models store user data with restricted context. To address this, we
propose a novel method to adapt and fine-tune transformer-based language models
on passkey-encrypted user-specific text. The original pre-trained language
model first undergoes a quick adaptation (without any further pre-training)
with a series of irreversible transformations applied to the tokenizer and
token embeddings. This enables the model to perform inference on encrypted
inputs while preventing reverse engineering of text from model parameters and
intermediate outputs. After adaptation, models are fine-tuned on encrypted
versions of existing training datasets. Experimental evaluation employing
adapted versions of renowned models (e.g., BERT, RoBERTa) across established
benchmark English and multilingual datasets for text classification and
sequence labeling shows that encrypted models achieve performance parity with
their original counterparts. This serves to safeguard performance, privacy, and
security cohesively.
Related papers
- Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models [112.48136829374741]
In this paper, we unveil a new vulnerability: the privacy backdoor attack.
When a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model.
Our findings highlight a critical privacy concern within the machine learning community and call for a reevaluation of safety protocols in the use of open-source pre-trained models.
arXiv Detail & Related papers (2024-04-01T16:50:54Z) - FLIP: Fine-grained Alignment between ID-based Models and Pretrained Language Models for CTR Prediction [49.510163437116645]
Click-through rate (CTR) prediction plays as a core function module in personalized online services.
Traditional ID-based models for CTR prediction take as inputs the one-hot encoded ID features of tabular modality.
Pretrained Language Models(PLMs) has given rise to another paradigm, which takes as inputs the sentences of textual modality.
We propose to conduct Fine-grained feature-level ALignment between ID-based Models and Pretrained Language Models(FLIP) for CTR prediction.
arXiv Detail & Related papers (2023-10-30T11:25:03Z) - PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind)
Our work offers a theoretical analysis for model design and benchmarks various techniques.
In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z) - Recovering from Privacy-Preserving Masking with Large Language Models [14.828717714653779]
We use large language models (LLMs) to suggest substitutes of masked tokens.
We show that models trained on the obfuscation corpora are able to achieve comparable performance with the ones trained on the original data.
arXiv Detail & Related papers (2023-09-12T16:39:41Z) - Training Natural Language Processing Models on Encrypted Text for
Enhanced Privacy [0.0]
We propose a method for training NLP models on encrypted text data to mitigate data privacy concerns.
Our results indicate that both encrypted and non-encrypted models achieve comparable performance.
arXiv Detail & Related papers (2023-05-03T00:37:06Z) - Q-LSTM Language Model -- Decentralized Quantum Multilingual Pre-Trained
Language Model for Privacy Protection [6.0038761646405225]
Large-scale language models are trained on a massive amount of natural language data that might encode or reflect our private information.
malicious agents can reverse engineer the training data even if data sanitation and differential privacy algorithms were involved in the pre-training process.
We propose a decentralized training framework to address privacy concerns in training large-scale language models.
arXiv Detail & Related papers (2022-10-06T21:29:17Z) - THE-X: Privacy-Preserving Transformer Inference with Homomorphic
Encryption [112.02441503951297]
Privacy-preserving inference of transformer models is on the demand of cloud service users.
We introduce $textitTHE-X$, an approximation approach for transformers, which enables privacy-preserving inference of pre-trained models.
arXiv Detail & Related papers (2022-06-01T03:49:18Z) - Just Fine-tune Twice: Selective Differential Privacy for Large Language
Models [69.66654761324702]
We propose a simple yet effective just-fine-tune-twice privacy mechanism to achieve SDP for large Transformer-based language models.
Experiments show that our models achieve strong performance while staying robust to the canary insertion attack.
arXiv Detail & Related papers (2022-04-15T22:36:55Z) - Selective Differential Privacy for Language Modeling [36.64464956102432]
Previous work has attempted to tackle this challenge by training RNN-based language models with differential privacy guarantees.
We propose a new privacy notion, selective differential privacy, to provide rigorous privacy guarantees on the sensitive portion of the data.
Experiments on both language modeling and dialog system building show that the proposed privacy-preserving mechanism achieves better utilities.
arXiv Detail & Related papers (2021-08-30T01:11:10Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.