Related papers: Privacy-Preserving Text Classification on BERT Embeddings with Homomorphic Encryption

Privacy-Preserving Text Classification on BERT Embeddings with Homomorphic Encryption

URL: http://arxiv.org/abs/2210.02574v1
Date: Wed, 5 Oct 2022 21:46:02 GMT
Title: Privacy-Preserving Text Classification on BERT Embeddings with Homomorphic Encryption
Authors: Garam Lee, Minsoo Kim, Jai Hyun Park, Seung-won Hwang, Jung Hee Cheon
Abstract summary: We propose a privatization mechanism for embeddings based on homomorphic encryption. We show that our method offers encrypted protection of BERT embeddings, while largely preserving their utility on downstream text classification tasks.
Score: 23.010346603025255
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Embeddings, which compress information in raw text into semantics-preserving low-dimensional vectors, have been widely adopted for their efficacy. However, recent research has shown that embeddings can potentially leak private information about sensitive attributes of the text, and in some cases, can be inverted to recover the original input text. To address these growing privacy challenges, we propose a privatization mechanism for embeddings based on homomorphic encryption, to prevent potential leakage of any piece of information in the process of text classification. In particular, our method performs text classification on the encryption of embeddings from state-of-the-art models like BERT, supported by an efficient GPU implementation of CKKS encryption scheme. We show that our method offers encrypted protection of BERT embeddings, while largely preserving their utility on downstream text classification tasks.

Related papers

Provably Secure Public-Key Steganography Based on Admissible Encoding [66.38591467056939]
The technique of hiding secret messages within seemingly harmless covertext is known as provably secure steganography (PSS) PSS evolves from symmetric key steganography to public-key steganography, functioning without the requirement of a pre-shared key. This paper proposes a more general elliptic curve public key steganography method based on admissible encoding.
arXiv Detail & Related papers (2025-04-28T03:42:25Z)
A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage [77.83757117924995]
We propose a new framework that evaluates re-identification attacks to quantify individual privacy risks upon data release. Our approach shows that seemingly innocuous auxiliary information can be used to infer sensitive attributes like age or substance use history from sanitized data.
arXiv Detail & Related papers (2025-04-28T01:16:27Z)
Secure Semantic Communication With Homomorphic Encryption [52.5344514499035]
This paper explores the feasibility of applying homomorphic encryption to SemCom. We propose a task-oriented SemCom scheme secured through homomorphic encryption.
arXiv Detail & Related papers (2025-01-17T13:26:14Z)
TextSleuth: Towards Explainable Tampered Text Detection [49.88698441048043]
We propose to explain the basis of tampered text detection with natural language via large multimodal models. To fill the data gap for this task, we propose a large-scale, comprehensive dataset, ETTD. Elaborate queries are introduced to generate high-quality anomaly descriptions with GPT4o. To automatically filter out low-quality annotations, we also propose to prompt GPT4o to recognize tampered texts.
arXiv Detail & Related papers (2024-12-19T13:10:03Z)
Subword Embedding from Bytes Gains Privacy without Sacrificing Accuracy and Complexity [5.7601856226895665]
We propose Subword Embedding from Bytes (SEB) and encode subwords to byte sequences using deep neural networks. Our solution outperforms conventional approaches by preserving privacy without sacrificing efficiency or accuracy. We verify SEB obtains comparable and even better results over standard subword embedding methods in machine translation, sentiment analysis, and language modeling.
arXiv Detail & Related papers (2024-10-21T18:25:24Z)
Decoder Pre-Training with only Text for Scene Text Recognition [54.93037783663204]
Scene text recognition (STR) pre-training methods have achieved remarkable progress, primarily relying on synthetic datasets. We introduce a novel method named Decoder Pre-training with only text for STR (DPTR) DPTR treats text embeddings produced by the CLIP text encoder as pseudo visual embeddings and uses them to pre-train the decoder.
arXiv Detail & Related papers (2024-08-11T06:36:42Z)
Just Rewrite It Again: A Post-Processing Method for Enhanced Semantic Similarity and Privacy Preservation of Differentially Private Rewritten Text [3.3916160303055567]
We propose a simple post-processing method based on the goal of aligning rewritten texts with their original counterparts. Our results show that such an approach not only produces outputs that are more semantically reminiscent of the original inputs, but also texts which score on average better in empirical privacy evaluations.
arXiv Detail & Related papers (2024-05-30T08:41:33Z)
Latent Guard: a Safety Framework for Text-to-image Generation [64.49596711025993]
Existing safety measures are either based on text blacklists, which can be easily circumvented, or harmful content classification. We propose Latent Guard, a framework designed to improve safety measures in text-to-image generation. Inspired by blacklist-based approaches, Latent Guard learns a latent space on top of the T2I model's text encoder, where it is possible to check the presence of harmful concepts.
arXiv Detail & Related papers (2024-04-11T17:59:52Z)
Silent Guardian: Protecting Text from Malicious Exploitation by Large Language Models [63.91178922306669]
We introduce Silent Guardian, a text protection mechanism against large language models (LLMs) By carefully modifying the text to be protected, TPE can induce LLMs to first sample the end token, thus directly terminating the interaction. We show that SG can effectively protect the target text under various configurations and achieve almost 100% protection success rate in some cases.
arXiv Detail & Related papers (2023-12-15T10:30:36Z)
Recoverable Privacy-Preserving Image Classification through Noise-like Adversarial Examples [26.026171363346975]
Cloud-based image related services such as classification have become crucial. In this study, we propose a novel privacypreserving image classification scheme. encrypted images can be decrypted back into their original form with high fidelity (recoverable) using a secret key.
arXiv Detail & Related papers (2023-10-19T13:01:58Z)
SemStamp: A Semantic Watermark with Paraphrastic Robustness for Text Generation [72.10931780019297]
Existing watermarking algorithms are vulnerable to paraphrase attacks because of their token-level design. We propose SemStamp, a robust sentence-level semantic watermarking algorithm based on locality-sensitive hashing (LSH) Experimental results show that our novel semantic watermark algorithm is not only more robust than the previous state-of-the-art method on both common and bigram paraphrase attacks, but also is better at preserving the quality of generation.
arXiv Detail & Related papers (2023-10-06T03:33:42Z)
General Framework for Reversible Data Hiding in Texts Based on Masked Language Modeling [15.136429369639686]
We propose a general framework to embed secret information into a given cover text. The embedded information and the original cover text can be perfectly retrieved from the marked text. Our results show that the original cover text and the secret information can be successfully embedded and extracted.
arXiv Detail & Related papers (2022-06-21T05:02:49Z)
Autoregressive Linguistic Steganography Based on BERT and Consistency Coding [17.881686153284267]
Linguistic steganography (LS) conceals the presence of communication by embedding secret information into a text. Recent algorithms use a language model (LM) to generate the steganographic text, which provides a higher payload compared with many previous arts. We propose a novel autoregressive LS algorithm based on BERT and consistency coding, which achieves a better trade-off between embedding payload and system security.
arXiv Detail & Related papers (2022-03-26T02:36:55Z)
Semantics-Preserved Distortion for Personal Privacy Protection in Information Management [65.08939490413037]
This paper suggests a linguistically-grounded approach to distort texts while maintaining semantic integrity. We present two distinct frameworks for semantic-preserving distortion: a generative approach and a substitutive approach. We also explore privacy protection in a specific medical information management scenario, showing our method effectively limits sensitive data memorization.
arXiv Detail & Related papers (2022-01-04T04:01:05Z)
Reinforcement Learning on Encrypted Data [58.39270571778521]
We present a preliminary, experimental study of how a DQN agent trained on encrypted states performs in environments with discrete and continuous state spaces. Our results highlight that the agent is still capable of learning in small state spaces even in presence of non-deterministic encryption, but performance collapses in more complex environments.
arXiv Detail & Related papers (2021-09-16T21:59:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.