DePrompt: Desensitization and Evaluation of Personal Identifiable Information in Large Language Model Prompts
- URL: http://arxiv.org/abs/2408.08930v1
- Date: Fri, 16 Aug 2024 02:38:25 GMT
- Title: DePrompt: Desensitization and Evaluation of Personal Identifiable Information in Large Language Model Prompts
- Authors: Xiongtao Sun, Gan Liu, Zhipeng He, Hui Li, Xiaoguang Li,
- Abstract summary: DePrompt is a desensitization protection and effectiveness evaluation framework for prompt.
We integrate contextual attributes to define privacy types, achieving high-precision PII entity identification.
Our framework is adaptable to prompts and can be extended to text usability-dependent scenarios.
- Score: 11.883785681042593
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prompt serves as a crucial link in interacting with large language models (LLMs), widely impacting the accuracy and interpretability of model outputs. However, acquiring accurate and high-quality responses necessitates precise prompts, which inevitably pose significant risks of personal identifiable information (PII) leakage. Therefore, this paper proposes DePrompt, a desensitization protection and effectiveness evaluation framework for prompt, enabling users to safely and transparently utilize LLMs. Specifically, by leveraging large model fine-tuning techniques as the underlying privacy protection method, we integrate contextual attributes to define privacy types, achieving high-precision PII entity identification. Additionally, through the analysis of key features in prompt desensitization scenarios, we devise adversarial generative desensitization methods that retain important semantic content while disrupting the link between identifiers and privacy attributes. Furthermore, we present utility evaluation metrics for prompt to better gauge and balance privacy and usability. Our framework is adaptable to prompts and can be extended to text usability-dependent scenarios. Through comparison with benchmarks and other model methods, experimental evaluations demonstrate that our desensitized prompt exhibit superior privacy protection utility and model inference results.
Related papers
- Privacy-Enhanced Adaptive Authentication: User Profiling with Privacy Guarantees [0.6554326244334866]
This paper introduces a novel privacy-enhanced adaptive authentication protocol.
It dynamically adjusts authentication requirements based on real-time risk assessments.
By adhering to data protection regulations such as CCPA, our protocol not only enhances security but also fosters user trust.
arXiv Detail & Related papers (2024-10-27T19:11:33Z) - ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs [72.13489820420726]
ProSA is a framework designed to evaluate and comprehend prompt sensitivity in large language models.
Our study uncovers that prompt sensitivity fluctuates across datasets and models, with larger models exhibiting enhanced robustness.
arXiv Detail & Related papers (2024-10-16T09:38:13Z) - Con-ReCall: Detecting Pre-training Data in LLMs via Contrastive Decoding [118.75567341513897]
Existing methods typically analyze target text in isolation or solely with non-member contexts.
We propose Con-ReCall, a novel approach that leverages the asymmetric distributional shifts induced by member and non-member contexts.
arXiv Detail & Related papers (2024-09-05T09:10:38Z) - Robust Utility-Preserving Text Anonymization Based on Large Language Models [80.5266278002083]
Text anonymization is crucial for sharing sensitive data while maintaining privacy.
Existing techniques face the emerging challenges of re-identification attack ability of Large Language Models.
This paper proposes a framework composed of three LLM-based components -- a privacy evaluator, a utility evaluator, and an optimization component.
arXiv Detail & Related papers (2024-07-16T14:28:56Z) - Differential Privacy for Anomaly Detection: Analyzing the Trade-off Between Privacy and Explainability [4.844901225743574]
We exploit the trade-off of applying Explainable AI (XAI) through SHapley Additive exPlanations (SHAP) and differential privacy (DP)
Our results show that the enforcement of privacy through DP has a significant impact on detection accuracy and explainability.
We further show that the visual interpretation of explanations is also influenced by the choice of the AD algorithm.
arXiv Detail & Related papers (2024-04-09T09:09:36Z) - ASSERT: Automated Safety Scenario Red Teaming for Evaluating the
Robustness of Large Language Models [65.79770974145983]
ASSERT, Automated Safety Scenario Red Teaming, consists of three methods -- semantically aligned augmentation, target bootstrapping, and adversarial knowledge injection.
We partition our prompts into four safety domains for a fine-grained analysis of how the domain affects model performance.
We find statistically significant performance differences of up to 11% in absolute classification accuracy among semantically related scenarios and error rates of up to 19% absolute error in zero-shot adversarial settings.
arXiv Detail & Related papers (2023-10-14T17:10:28Z) - PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind)
Our work offers a theoretical analysis for model design and benchmarks various techniques.
In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z) - Why Should I Trust a Model is Private? Using Shifts in Model Explanation
for Evaluating Privacy-Preserving Emotion Recognition Model [35.016050900061]
We focus on using interpretable methods to evaluate a model's efficacy to preserve privacy with respect to sensitive variables.
We show how certain commonly-used methods that seek to preserve privacy might not align with human perception of privacy preservation.
We conduct crowdsourcing experiments to evaluate the inclination of the evaluators to choose a particular model for a given task.
arXiv Detail & Related papers (2021-04-18T09:56:41Z) - Beyond The Text: Analysis of Privacy Statements through Syntactic and
Semantic Role Labeling [12.74252812104216]
This paper formulates a new task of extracting privacy parameters from a privacy policy, through the lens of Contextual Integrity.
We show that traditional NLP tasks, including the recently proposed Question-Answering based solutions, are insufficient to address the privacy parameter extraction problem.
arXiv Detail & Related papers (2020-10-01T20:48:37Z) - Differentially Private and Fair Deep Learning: A Lagrangian Dual
Approach [54.32266555843765]
This paper studies a model that protects the privacy of the individuals sensitive information while also allowing it to learn non-discriminatory predictors.
The method relies on the notion of differential privacy and the use of Lagrangian duality to design neural networks that can accommodate fairness constraints.
arXiv Detail & Related papers (2020-09-26T10:50:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.