Semantics-Preserved Distortion for Personal Privacy Protection in Information Management
- URL: http://arxiv.org/abs/2201.00965v3
- Date: Mon, 8 Jul 2024 21:20:21 GMT
- Title: Semantics-Preserved Distortion for Personal Privacy Protection in Information Management
- Authors: Jiajia Li, Lu Yang, Letian Peng, Shitou Zhang, Ping Wang, Zuchao Li, Hai Zhao,
- Abstract summary: This paper suggests a linguistically-grounded approach to distort texts while maintaining semantic integrity.
We present two distinct frameworks for semantic-preserving distortion: a generative approach and a substitutive approach.
We also explore privacy protection in a specific medical information management scenario, showing our method effectively limits sensitive data memorization.
- Score: 65.08939490413037
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, machine learning - particularly deep learning - has significantly impacted the field of information management. While several strategies have been proposed to restrict models from learning and memorizing sensitive information from raw texts, this paper suggests a more linguistically-grounded approach to distort texts while maintaining semantic integrity. To this end, we leverage Neighboring Distribution Divergence, a novel metric to assess the preservation of semantic meaning during distortion. Building on this metric, we present two distinct frameworks for semantic-preserving distortion: a generative approach and a substitutive approach. Our evaluations across various tasks, including named entity recognition, constituency parsing, and machine reading comprehension, affirm the plausibility and efficacy of our distortion technique in personal privacy protection. We also test our method against attribute attacks in three privacy-focused assignments within the NLP domain, and the findings underscore the simplicity and efficacy of our data-based improvement approach over structural improvement approaches. Moreover, we explore privacy protection in a specific medical information management scenario, showing our method effectively limits sensitive data memorization, underscoring its practicality.
Related papers
- Con-ReCall: Detecting Pre-training Data in LLMs via Contrastive Decoding [118.75567341513897]
Existing methods typically analyze target text in isolation or solely with non-member contexts.
We propose Con-ReCall, a novel approach that leverages the asymmetric distributional shifts induced by member and non-member contexts.
arXiv Detail & Related papers (2024-09-05T09:10:38Z) - Analyzing Inference Privacy Risks Through Gradients in Machine Learning [17.2657358645072]
We present a unified game-based framework that encompasses a broad range of attacks including attribute, property, distributional, and user disclosures.
Our results demonstrate the inefficacy of solely relying on data aggregation to achieve privacy against inference attacks in distributed learning.
arXiv Detail & Related papers (2024-08-29T21:21:53Z) - MaSS: Multi-attribute Selective Suppression for Utility-preserving Data Transformation from an Information-theoretic Perspective [10.009178591853058]
We propose a formal information-theoretic definition for this utility-preserving privacy protection problem.
We design a data-driven learnable data transformation framework that is capable of suppressing sensitive attributes from target datasets.
Results demonstrate the effectiveness and generalizability of our method under various configurations.
arXiv Detail & Related papers (2024-05-23T18:35:46Z) - PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind)
Our work offers a theoretical analysis for model design and benchmarks various techniques.
In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z) - Theoretically Principled Federated Learning for Balancing Privacy and
Utility [61.03993520243198]
We propose a general learning framework for the protection mechanisms that protects privacy via distorting model parameters.
It can achieve personalized utility-privacy trade-off for each model parameter, on each client, at each communication round in federated learning.
arXiv Detail & Related papers (2023-05-24T13:44:02Z) - In and Out-of-Domain Text Adversarial Robustness via Label Smoothing [64.66809713499576]
We study the adversarial robustness provided by various label smoothing strategies in foundational models for diverse NLP tasks.
Our experiments show that label smoothing significantly improves adversarial robustness in pre-trained models like BERT, against various popular attacks.
We also analyze the relationship between prediction confidence and robustness, showing that label smoothing reduces over-confident errors on adversarial examples.
arXiv Detail & Related papers (2022-12-20T14:06:50Z) - The Limits of Word Level Differential Privacy [30.34805746574316]
We propose a new method for text anonymization based on transformer based language models fine-tuned for paraphrasing.
We evaluate the performance of our method via thorough experimentation and demonstrate superior performance over the discussed mechanisms.
arXiv Detail & Related papers (2022-05-02T21:53:10Z) - Privacy-Preserving Federated Learning on Partitioned Attributes [6.661716208346423]
Federated learning empowers collaborative training without exposing local data or models.
We introduce an adversarial learning based procedure which tunes a local model to release privacy-preserving intermediate representations.
To alleviate the accuracy decline, we propose a defense method based on the forward-backward splitting algorithm.
arXiv Detail & Related papers (2021-04-29T14:49:14Z) - Differentially Private and Fair Deep Learning: A Lagrangian Dual
Approach [54.32266555843765]
This paper studies a model that protects the privacy of the individuals sensitive information while also allowing it to learn non-discriminatory predictors.
The method relies on the notion of differential privacy and the use of Lagrangian duality to design neural networks that can accommodate fairness constraints.
arXiv Detail & Related papers (2020-09-26T10:50:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.