Multi-Layer Privacy-Preserving Record Linkage with Clerical Review based on gradual information disclosure
- URL: http://arxiv.org/abs/2412.04178v1
- Date: Thu, 05 Dec 2024 14:18:50 GMT
- Title: Multi-Layer Privacy-Preserving Record Linkage with Clerical Review based on gradual information disclosure
- Authors: Florens Rohde, Victor Christen, Martin Franke, Erhard Rahm,
- Abstract summary: Privacy-Preserving Record linkage (PPRL) is an essential component in data integration tasks of sensitive information.<n>We present a novel privacy-preserving protocol that integrates clerical review in PPRL using a multi-layer active learning process.<n>The experimental evaluation on real-world datasets shows considerable linkage quality improvements with limited labeling effort and privacy risks.
- Score: 1.2453705483335629
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Privacy-Preserving Record linkage (PPRL) is an essential component in data integration tasks of sensitive information. The linkage quality determines the usability of combined datasets and (machine learning) applications based on them. We present a novel privacy-preserving protocol that integrates clerical review in PPRL using a multi-layer active learning process. Uncertain match candidates are reviewed on several layers by human and non-human oracles to reduce the amount of disclosed information per record and in total. Predictions are propagated back to update previous layers, resulting in an improved linkage performance for non-reviewed candidates as well. The data owners remain in control of the amount of information they share for each record. Therefore, our approach follows need-to-know and data sovereignty principles. The experimental evaluation on real-world datasets shows considerable linkage quality improvements with limited labeling effort and privacy risks.
Related papers
- Human-Centered Interactive Anonymization for Privacy-Preserving Machine Learning: A Case for Human-Guided k-Anonymity [0.0]
We propose an interactive approach that incorporates human input into the k-anonymization process.<n>Using the UCI Adult dataset, we compare classification outcomes of interactive human-influenced anonymization with traditional, fully automated methods.<n>Our results show that human input can enhance data utility in some cases, although results vary across tasks and settings.
arXiv Detail & Related papers (2025-07-05T17:20:18Z) - Evaluating Differential Privacy on Correlated Datasets Using Pointwise Maximal Leakage [38.4830633082184]
Data-driven advancements pose substantial risks to privacy.
differential privacy has become a cornerstone in privacy preservation efforts.
Our work aims to foster a deeper understanding of subtle privacy risks and highlight the need for the development of more effective privacy-preserving mechanisms.
arXiv Detail & Related papers (2025-02-08T10:30:45Z) - Accelerating Privacy-Preserving Medical Record Linkage: A Three-Party MPC Approach [1.7999333451993955]
This paper presents a novel and efficient PPRL based on a secure 3-party computation framework.
We demonstrate that our method preserves the linkage quality of the state-of-the-art PPRL method while achieving up to 14 times faster performance.
arXiv Detail & Related papers (2024-10-28T23:13:01Z) - Towards Split Learning-based Privacy-Preserving Record Linkage [49.1574468325115]
Split Learning has been introduced to facilitate applications where user data privacy is a requirement.
In this paper, we investigate the potentials of Split Learning for Privacy-Preserving Record Matching.
arXiv Detail & Related papers (2024-09-02T09:17:05Z) - PriRoAgg: Achieving Robust Model Aggregation with Minimum Privacy Leakage for Federated Learning [49.916365792036636]
Federated learning (FL) has recently gained significant momentum due to its potential to leverage large-scale distributed user data.
The transmitted model updates can potentially leak sensitive user information, and the lack of central control of the local training process leaves the global model susceptible to malicious manipulations on model updates.
We develop a general framework PriRoAgg, utilizing Lagrange coded computing and distributed zero-knowledge proof, to execute a wide range of robust aggregation algorithms while satisfying aggregated privacy.
arXiv Detail & Related papers (2024-07-12T03:18:08Z) - RASE: Efficient Privacy-preserving Data Aggregation against Disclosure Attacks for IoTs [2.1765174838950494]
We study the new paradigm for collecting and protecting the data produced by ever-increasing sensor devices.
Most previous studies on co-design of data aggregation and privacy preservation assume that a trusted fusion center adheres to privacy regimes.
We propose a novel paradigm (called RASE), which can be generalized into a 3-step sequential procedure, noise addition, followed by random permutation, and then parameter estimation.
arXiv Detail & Related papers (2024-05-31T15:21:38Z) - F-OAL: Forward-only Online Analytic Learning with Fast Training and Low Memory Footprint in Class Incremental Learning [28.772554281694166]
Online Class Incremental Learning (OCIL) aims to train models incrementally, where data arrive in mini-batches, and previous data are not accessible.
A major challenge in OCIL is Catastrophic Forgetting, i.e., the loss of previously learned knowledge.
We propose an exemplar-free approach--Forward-only Online Analytic Learning (F-OAL)
Unlike traditional methods, F-OAL does not rely on back-propagation and is forward-only, significantly reducing memory usage and computational time.
arXiv Detail & Related papers (2024-03-23T07:39:13Z) - A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing.
Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data.
Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z) - How Do Input Attributes Impact the Privacy Loss in Differential Privacy? [55.492422758737575]
We study the connection between the per-subject norm in DP neural networks and individual privacy loss.
We introduce a novel metric termed the Privacy Loss-Input Susceptibility (PLIS) which allows one to apportion the subject's privacy loss to their input attributes.
arXiv Detail & Related papers (2022-11-18T11:39:03Z) - Privacy-preserving Deep Learning based Record Linkage [14.755422488889824]
We propose the first deep learning-based multi-party privacy-preserving record linkage protocol.
In our approach, each database owner first trains a local deep learning model, which is then uploaded to a secure environment.
The global model is then used by a linkage unit to distinguish unlabelled record pairs as matches and non-matches.
arXiv Detail & Related papers (2022-11-03T22:10:12Z) - Retrieval Enhanced Data Augmentation for Question Answering on Privacy
Policies [74.01792675564218]
We develop a data augmentation framework based on ensembling retriever models that captures relevant text segments from unlabeled policy documents.
To improve the diversity and quality of the augmented data, we leverage multiple pre-trained language models (LMs) and cascade them with noise reduction filter models.
Using our augmented data on the PrivacyQA benchmark, we elevate the existing baseline by a large margin (10% F1) and achieve a new state-of-the-art F1 score of 50%.
arXiv Detail & Related papers (2022-04-19T15:45:23Z) - ORDisCo: Effective and Efficient Usage of Incremental Unlabeled Data for
Semi-supervised Continual Learning [52.831894583501395]
Continual learning assumes the incoming data are fully labeled, which might not be applicable in real applications.
We propose deep Online Replay with Discriminator Consistency (ORDisCo) to interdependently learn a classifier with a conditional generative adversarial network (GAN)
We show ORDisCo achieves significant performance improvement on various semi-supervised learning benchmark datasets for SSCL.
arXiv Detail & Related papers (2021-01-02T09:04:14Z) - A Critical Overview of Privacy-Preserving Approaches for Collaborative
Forecasting [0.0]
Cooperation between different data owners may lead to an improvement in forecast quality.
Due to business competitive factors and personal data protection questions, said data owners might be unwilling to share their data.
This paper analyses the state-of-the-art and unveils several shortcomings of existing methods in guaranteeing data privacy.
arXiv Detail & Related papers (2020-04-20T20:21:04Z) - Privacy-Aware Time-Series Data Sharing with Deep Reinforcement Learning [33.42328078385098]
We study the privacy-utility trade-off (PUT) in time-series data sharing.
Methods that preserve the privacy for the current time may leak significant amount of information at the trace level.
We consider sharing the distorted version of a user's true data sequence with an untrusted third party.
arXiv Detail & Related papers (2020-03-04T18:47:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.