Life of PII -- A PII Obfuscation Transformer
- URL: http://arxiv.org/abs/2305.09550v2
- Date: Wed, 17 May 2023 12:32:26 GMT
- Title: Life of PII -- A PII Obfuscation Transformer
- Authors: Ajinkya Deshmukh, Saumya Banthia, Anantha Sharma
- Abstract summary: 'Life of PII' is a novel Obfuscation Transformer framework for transforming Personal Identifiable Information (PII) into faux-PII.
We show that our approach can effectively reduce utility loss while preserving the original information, offering greater flexibility in the trade-off between privacy protection and data utility.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Protecting sensitive information is crucial in today's world of Large
Language Models (LLMs) and data-driven services. One common method used to
preserve privacy is by using data perturbation techniques to reduce
overreaching utility of (sensitive) Personal Identifiable Information (PII)
data while maintaining its statistical and semantic properties. Data
perturbation methods often result in significant information loss, making them
impractical for use. In this paper, we propose 'Life of PII', a novel
Obfuscation Transformer framework for transforming PII into faux-PII while
preserving the original information, intent, and context as much as possible.
Our approach includes an API to interface with the given document, a
configuration-based obfuscator, and a model based on the Transformer
architecture, which has shown high context preservation and performance in
natural language processing tasks and LLMs.
Our Transformer-based approach learns mapping between the original PII and
its transformed faux-PII representation, which we call "obfuscated" data. Our
experiments demonstrate that our method, called Life of PII, outperforms
traditional data perturbation techniques in terms of both utility preservation
and privacy protection. We show that our approach can effectively reduce
utility loss while preserving the original information, offering greater
flexibility in the trade-off between privacy protection and data utility. Our
work provides a solution for protecting PII in various real-world applications.
Related papers
- Enhancing Security Using Random Binary Weights in Privacy-Preserving Federated Learning [5.311735227179715]
We propose a novel method for enhancing security in privacy-preserving federated learning using the Vision Transformer.
In federated learning, learning is performed by collecting updated information without collecting raw data from each client.
The effectiveness of the proposed method is confirmed in terms of model performance and resistance to the APRIL (Attention PRIvacy Leakage) restoration attack.
arXiv Detail & Related papers (2024-09-30T06:28:49Z) - Mitigating the Privacy Issues in Retrieval-Augmented Generation (RAG) via Pure Synthetic Data [51.41288763521186]
Retrieval-augmented generation (RAG) enhances the outputs of language models by integrating relevant information retrieved from external knowledge sources.
RAG systems may face severe privacy risks when retrieving private data.
We propose using synthetic data as a privacy-preserving alternative for the retrieval data.
arXiv Detail & Related papers (2024-06-20T22:53:09Z) - FewFedPIT: Towards Privacy-preserving and Few-shot Federated Instruction Tuning [54.26614091429253]
Federated instruction tuning (FedIT) is a promising solution, by consolidating collaborative training across multiple data owners.
FedIT encounters limitations such as scarcity of instructional data and risk of exposure to training data extraction attacks.
We propose FewFedPIT, designed to simultaneously enhance privacy protection and model performance of federated few-shot learning.
arXiv Detail & Related papers (2024-03-10T08:41:22Z) - PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind)
Our work offers a theoretical analysis for model design and benchmarks various techniques.
In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z) - Robust Representation Learning for Privacy-Preserving Machine Learning:
A Multi-Objective Autoencoder Approach [0.9831489366502302]
We propose a robust representation learning framework for privacy-preserving machine learning (ppML)
Our method centers on training autoencoders in a multi-objective manner and then concatenating the latent and learned features from the encoding part as the encoded form of our data.
With our proposed framework, we can share our data and use third party tools without being under the threat of revealing its original form.
arXiv Detail & Related papers (2023-09-08T16:41:25Z) - Differentially-Private Data Synthetisation for Efficient Re-Identification Risk Control [3.8811062755861956]
$epsilon$-PrivateSMOTE is a technique for safeguarding against re-identification and linkage attacks.
Our proposal combines synthetic data generation via noise-induced adversarial with differential privacy principles to obfuscate high-risk cases.
arXiv Detail & Related papers (2022-12-01T13:20:37Z) - Just Fine-tune Twice: Selective Differential Privacy for Large Language
Models [69.66654761324702]
We propose a simple yet effective just-fine-tune-twice privacy mechanism to achieve SDP for large Transformer-based language models.
Experiments show that our models achieve strong performance while staying robust to the canary insertion attack.
arXiv Detail & Related papers (2022-04-15T22:36:55Z) - Do Gradient Inversion Attacks Make Federated Learning Unsafe? [70.0231254112197]
Federated learning (FL) allows the collaborative training of AI models without needing to share raw data.
Recent works on the inversion of deep neural networks from model gradients raised concerns about the security of FL in preventing the leakage of training data.
In this work, we show that these attacks presented in the literature are impractical in real FL use-cases and provide a new baseline attack.
arXiv Detail & Related papers (2022-02-14T18:33:12Z) - Semantics-Preserved Distortion for Personal Privacy Protection in Information Management [65.08939490413037]
This paper suggests a linguistically-grounded approach to distort texts while maintaining semantic integrity.
We present two distinct frameworks for semantic-preserving distortion: a generative approach and a substitutive approach.
We also explore privacy protection in a specific medical information management scenario, showing our method effectively limits sensitive data memorization.
arXiv Detail & Related papers (2022-01-04T04:01:05Z) - ADePT: Auto-encoder based Differentially Private Text Transformation [22.068984615657463]
We provide a utility-preserving differentially private text transformation algorithm using auto-encoders.
Our algorithm transforms text to offer robustness against attacks and produces transformations with high semantic quality.
Our results show that the proposed model performs better against MIA attacks while offering lower to no degradation in the utility of the underlying transformation process.
arXiv Detail & Related papers (2021-01-29T23:15:24Z) - FLFE: A Communication-Efficient and Privacy-Preserving Federated Feature
Engineering Framework [16.049161581014513]
We present a framework called FLFE to conduct privacy-preserving and communication-preserving multi-party feature transformations.
The framework pre-learns the pattern of the feature to directly judge the usefulness of the transformation on a feature.
arXiv Detail & Related papers (2020-09-05T16:08:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.