Related papers: Locally Differentially Private Document Generation Using Zero Shot Prompting

Locally Differentially Private Document Generation Using Zero Shot Prompting

URL: http://arxiv.org/abs/2310.16111v2
Date: Thu, 30 Nov 2023 18:13:01 GMT
Title: Locally Differentially Private Document Generation Using Zero Shot Prompting
Authors: Saiteja Utpala, Sara Hooker, Pin Yu Chen
Abstract summary: We propose a locally differentially private mechanism called DP-Prompt to counter author de-anonymization attacks. When DP-Prompt is used with a powerful language model like ChatGPT (gpt-3.5), we observe a notable reduction in the success rate of de-anonymization attacks.
Score: 61.20953109732442
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Numerous studies have highlighted the privacy risks associated with pretrained large language models. In contrast, our research offers a unique perspective by demonstrating that pretrained large language models can effectively contribute to privacy preservation. We propose a locally differentially private mechanism called DP-Prompt, which leverages the power of pretrained large language models and zero-shot prompting to counter author de-anonymization attacks while minimizing the impact on downstream utility. When DP-Prompt is used with a powerful language model like ChatGPT (gpt-3.5), we observe a notable reduction in the success rate of de-anonymization attacks, showing that it surpasses existing approaches by a considerable margin despite its simpler design. For instance, in the case of the IMDB dataset, DP-Prompt (with ChatGPT) perfectly recovers the clean sentiment F1 score while achieving a 46\% reduction in author identification F1 score against static attackers and a 26\% reduction against adaptive attackers. We conduct extensive experiments across six open-source large language models, ranging up to 7 billion parameters, to analyze various effects of the privacy-utility tradeoff.

Related papers

Neural Antidote: Class-Wise Prompt Tuning for Purifying Backdoors in Pre-trained Vision-Language Models [42.81731204702258]
Class-wise Backdoor Prompt Tuning (CBPT) is an efficient and effective method that operates on the text prompts to indirectly purify poisoned Vision-Language Models (VLMs) CBPT significantly mitigates backdoor threats while preserving model utility, e.g. an average Clean Accuracy (CA) of 58.86% and an Attack Success Rate (ASR) of 0.39% across seven mainstream backdoor attacks.
arXiv Detail & Related papers (2025-02-26T16:25:15Z)
Differentially Private Learning Needs Better Model Initialization and Self-Distillation [1.8069913326395433]
Differentially private SGD (DPSGD) enables privacy-preserving training of language models, but often reduces utility, diversity, and linguistic quality. We introduce DPRefine, a three-phase method that initializes a model using data from a small pre-trained LM with rigorous filtering. We find that small models like GPT-2 can be effective for synthesis and distillation, highlighting their potential in enabling scalable and efficient deployment of privacy-preserving language.
arXiv Detail & Related papers (2024-10-23T05:19:51Z)
STOP! Benchmarking Large Language Models with Sensitivity Testing on Offensive Progressions [6.19084217044276]
Mitigating explicit and implicit biases in Large Language Models (LLMs) has become a critical focus in the field of natural language processing. We introduce the Sensitivity Testing on Offensive Progressions dataset, which includes 450 offensive progressions containing 2,700 unique sentences. Our findings reveal that even the best-performing models detect bias inconsistently, with success rates ranging from 19.3% to 69.8%.
arXiv Detail & Related papers (2024-09-20T18:34:38Z)
TuBA: Cross-Lingual Transferability of Backdoor Attacks in LLMs with Instruction Tuning [63.481446315733145]
Cross-lingual backdoor attacks against multilingual large language models (LLMs) are under-explored. Our research focuses on how poisoning the instruction-tuning data for one or two languages can affect the outputs for languages whose instruction-tuning data were not poisoned. Our method exhibits remarkable efficacy in models like mT5 and GPT-4o, with high attack success rates, surpassing 90% in more than 7 out of 12 languages.
arXiv Detail & Related papers (2024-04-30T14:43:57Z)
Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models [112.48136829374741]
In this paper, we unveil a new vulnerability: the privacy backdoor attack. When a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model. Our findings highlight a critical privacy concern within the machine learning community and call for a reevaluation of safety protocols in the use of open-source pre-trained models.
arXiv Detail & Related papers (2024-04-01T16:50:54Z)
SoK: Reducing the Vulnerability of Fine-tuned Language Models to Membership Inference Attacks [1.03590082373586]
We provide the first systematic review of the vulnerability of large language models to membership inference attacks. We find that some training methods provide significantly reduced privacy risk, with the combination of differential privacy and low-rank adaptors achieving the best privacy protection against these attacks.
arXiv Detail & Related papers (2024-03-13T12:46:51Z)
Dynamic Transformers Provide a False Sense of Efficiency [75.39702559746533]
Multi-exit models make a trade-off between efficiency and accuracy, where the saving of computation comes from an early exit. We propose a simple yet effective attacking framework, SAME, which is specially tailored to reduce the efficiency of the multi-exit models. Experiments on the GLUE benchmark show that SAME can effectively diminish the efficiency gain of various multi-exit models by 80% on average.
arXiv Detail & Related papers (2023-05-20T16:41:48Z)
Just Fine-tune Twice: Selective Differential Privacy for Large Language Models [69.66654761324702]
We propose a simple yet effective just-fine-tune-twice privacy mechanism to achieve SDP for large Transformer-based language models. Experiments show that our models achieve strong performance while staying robust to the canary insertion attack.
arXiv Detail & Related papers (2022-04-15T22:36:55Z)
Defending against Reconstruction Attacks with R\'enyi Differential Privacy [72.1188520352079]
Reconstruction attacks allow an adversary to regenerate data samples of the training set using access to only a trained model. Differential privacy is a known solution to such attacks, but is often used with a relatively large privacy budget. We show that, for a same mechanism, we can derive privacy guarantees for reconstruction attacks that are better than the traditional ones from the literature.
arXiv Detail & Related papers (2022-02-15T18:09:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.