Protecting Privacy in Classifiers by Token Manipulation
- URL: http://arxiv.org/abs/2407.01334v2
- Date: Wed, 3 Jul 2024 16:31:52 GMT
- Title: Protecting Privacy in Classifiers by Token Manipulation
- Authors: Re'em Harel, Yair Elboher, Yuval Pinter,
- Abstract summary: We focus on text classification models, examining various token mapping and contextualized manipulation functions.
We find that although some token mapping functions are easy and straightforward to implement, they heavily influence performance on the downstream task.
In comparison, the contextualized manipulation provides an improvement in performance.
- Score: 3.5033860596797965
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Using language models as a remote service entails sending private information to an untrusted provider. In addition, potential eavesdroppers can intercept the messages, thereby exposing the information. In this work, we explore the prospects of avoiding such data exposure at the level of text manipulation. We focus on text classification models, examining various token mapping and contextualized manipulation functions in order to see whether classifier accuracy may be maintained while keeping the original text unrecoverable. We find that although some token mapping functions are easy and straightforward to implement, they heavily influence performance on the downstream task, and via a sophisticated attacker can be reconstructed. In comparison, the contextualized manipulation provides an improvement in performance.
Related papers
- IDT: Dual-Task Adversarial Attacks for Privacy Protection [8.312362092693377]
Methods to protect privacy can involve using representations inside models that are not to detect sensitive attributes.
We propose IDT, a method that analyses predictions made by auxiliary and interpretable models to identify which tokens are important to change.
We evaluate different datasets for NLP suitable for different tasks.
arXiv Detail & Related papers (2024-06-28T04:14:35Z) - Token-Level Adversarial Prompt Detection Based on Perplexity Measures
and Contextual Information [67.78183175605761]
Large Language Models are susceptible to adversarial prompt attacks.
This vulnerability underscores a significant concern regarding the robustness and reliability of LLMs.
We introduce a novel approach to detecting adversarial prompts at a token level.
arXiv Detail & Related papers (2023-11-20T03:17:21Z) - Privacy Leakage in Text Classification: A Data Extraction Approach [9.045332526072828]
We study the potential privacy leakage in the text classification domain by investigating the problem of unintended memorization of training data.
We propose an algorithm to extract missing tokens of a partial text by exploiting the likelihood of the class label provided by the model.
arXiv Detail & Related papers (2022-06-09T16:14:26Z) - Span Classification with Structured Information for Disfluency Detection
in Spoken Utterances [47.05113261111054]
We propose a novel architecture for detecting disfluencies in transcripts from spoken utterances.
Our proposed model achieves state-of-the-art results on the widely used English Switchboard for disfluency detection.
arXiv Detail & Related papers (2022-03-30T03:22:29Z) - DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting [91.56988987393483]
We present a new framework for dense prediction by implicitly and explicitly leveraging the pre-trained knowledge from CLIP.
Specifically, we convert the original image-text matching problem in CLIP to a pixel-text matching problem and use the pixel-text score maps to guide the learning of dense prediction models.
Our method is model-agnostic, which can be applied to arbitrary dense prediction systems and various pre-trained visual backbones.
arXiv Detail & Related papers (2021-12-02T18:59:32Z) - Honest-but-Curious Nets: Sensitive Attributes of Private Inputs can be
Secretly Coded into the Entropy of Classifiers' Outputs [1.0742675209112622]
Deep neural networks, trained for the classification of a non-sensitive target attribute, can reveal sensitive attributes of their input data.
We show that deep classifiers can be trained to secretly encode a sensitive attribute of users' input data, at inference time.
arXiv Detail & Related papers (2021-05-25T16:27:57Z) - Robust and Verifiable Information Embedding Attacks to Deep Neural
Networks via Error-Correcting Codes [81.85509264573948]
In the era of deep learning, a user often leverages a third-party machine learning tool to train a deep neural network (DNN) classifier.
In an information embedding attack, an attacker is the provider of a malicious third-party machine learning tool.
In this work, we aim to design information embedding attacks that are verifiable and robust against popular post-processing methods.
arXiv Detail & Related papers (2020-10-26T17:42:42Z) - Privacy Guarantees for De-identifying Text Transformations [17.636430224292866]
We derive formal privacy guarantees for text transformation-based de-identification methods on the basis of Differential Privacy.
We compare a simple redact approach with more sophisticated word-by-word replacement using deep learning models on multiple natural language understanding tasks.
We find that only word-by-word replacement is robust against performance drops in various tasks.
arXiv Detail & Related papers (2020-08-07T12:06:42Z) - How Context Affects Language Models' Factual Predictions [134.29166998377187]
We integrate information from a retrieval system with a pre-trained language model in a purely unsupervised way.
We report that augmenting pre-trained language models in this way dramatically improves performance and that the resulting system, despite being unsupervised, is competitive with a supervised machine reading baseline.
arXiv Detail & Related papers (2020-05-10T09:28:12Z) - Expertise Style Transfer: A New Task Towards Better Communication
between Experts and Laymen [88.30492014778943]
We propose a new task of expertise style transfer and contribute a manually annotated dataset.
Solving this task not only simplifies the professional language, but also improves the accuracy and expertise level of laymen descriptions.
We establish the benchmark performance of five state-of-the-art models for style transfer and text simplification.
arXiv Detail & Related papers (2020-05-02T04:50:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.