Related papers: Privacy-preserving Deep Learning based Record Linkage

Privacy-preserving Deep Learning based Record Linkage

URL: http://arxiv.org/abs/2211.02161v1
Date: Thu, 3 Nov 2022 22:10:12 GMT
Title: Privacy-preserving Deep Learning based Record Linkage
Authors: Thilina Ranbaduge, Dinusha Vatsalan, Ming Ding
Abstract summary: We propose the first deep learning-based multi-party privacy-preserving record linkage protocol. In our approach, each database owner first trains a local deep learning model, which is then uploaded to a secure environment. The global model is then used by a linkage unit to distinguish unlabelled record pairs as matches and non-matches.
Score: 14.755422488889824
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep learning-based linkage of records across different databases is becoming increasingly useful in data integration and mining applications to discover new insights from multiple sources of data. However, due to privacy and confidentiality concerns, organisations often are not willing or allowed to share their sensitive data with any external parties, thus making it challenging to build/train deep learning models for record linkage across different organizations' databases. To overcome this limitation, we propose the first deep learning-based multi-party privacy-preserving record linkage (PPRL) protocol that can be used to link sensitive databases held by multiple different organisations. In our approach, each database owner first trains a local deep learning model, which is then uploaded to a secure environment and securely aggregated to create a global model. The global model is then used by a linkage unit to distinguish unlabelled record pairs as matches and non-matches. We utilise differential privacy to achieve provable privacy protection against re-identification attacks. We evaluate the linkage quality and scalability of our approach using several large real-world databases, showing that it can achieve high linkage quality while providing sufficient privacy protection against existing attacks.

Related papers

How to DP-fy Your Data: A Practical Guide to Generating Synthetic Data With Differential Privacy [52.00934156883483]
Differential Privacy (DP) is a framework for reasoning about and limiting information leakage.<n>Differentially Private Synthetic data refers to synthetic data that preserves the overall trends of source data.
arXiv Detail & Related papers (2025-12-02T21:14:39Z)
Synth-MIA: A Testbed for Auditing Privacy Leakage in Tabular Data Synthesis [8.4361320391543]
Tabular Generative Models are often argued to preserve privacy by creating synthetic datasets that resemble training data.<n>Membership Inference Attacks (MIAs) have recently emerged as a method for evaluating privacy leakage in synthetic data.<n>We propose a unified, model-agnostic threat framework that deploys a collection of attacks to estimate the maximum empirical privacy leakage in synthetic datasets.
arXiv Detail & Related papers (2025-09-22T16:53:38Z)
Differentially Private Relational Learning with Entity-level Privacy Guarantees [17.567309430451616]
This work presents a principled framework for relational learning with formal entity-level DP guarantees.<n>We introduce an adaptive gradient clipping scheme that modulates clipping thresholds based on entity occurrence frequency.<n>These contributions lead to a tailored DP-SGD variant for relational data with provable privacy guarantees.
arXiv Detail & Related papers (2025-06-10T02:03:43Z)
Privacy-Preserving Federated Embedding Learning for Localized Retrieval-Augmented Generation [60.81109086640437]
We propose a novel framework called Federated Retrieval-Augmented Generation (FedE4RAG)<n>FedE4RAG facilitates collaborative training of client-side RAG retrieval models.<n>We apply homomorphic encryption within federated learning to safeguard model parameters.
arXiv Detail & Related papers (2025-04-27T04:26:02Z)
Secure Visual Data Processing via Federated Learning [2.4374097382908477]
This paper addresses the need for privacy-preserving solutions in large-scale visual data processing. We propose a new approach that combines object detection, federated learning and anonymization. Our solution is evaluated against traditional centralized models, showing that while there is a slight trade-off in accuracy, the privacy benefits are substantial.
arXiv Detail & Related papers (2025-02-09T09:44:18Z)
Investigating Privacy Attacks in the Gray-Box Setting to Enhance Collaborative Learning Schemes [7.651569149118461]
We study privacy attacks in the gray-box setting, where the attacker has only limited access to the model. We deploy SmartNNCrypt, a framework that tailors homomorphic encryption to protect the portions of the model posing higher privacy risks.
arXiv Detail & Related papers (2024-09-25T18:49:21Z)
Robust Utility-Preserving Text Anonymization Based on Large Language Models [80.5266278002083]
Text anonymization is crucial for sharing sensitive data while maintaining privacy. Existing techniques face the emerging challenges of re-identification attack ability of Large Language Models. This paper proposes a framework composed of three LLM-based components -- a privacy evaluator, a utility evaluator, and an optimization component.
arXiv Detail & Related papers (2024-07-16T14:28:56Z)
Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning [62.224804688233]
differential privacy (DP) offers a promising solution by ensuring models are 'almost indistinguishable' with or without any particular privacy unit. We study user-level DP motivated by applications where it necessary to ensure uniform privacy protection across users.
arXiv Detail & Related papers (2024-06-20T13:54:32Z)
Federated Transfer Learning with Differential Privacy [21.50525027559563]
We formulate the notion of textitfederated differential privacy, which offers privacy guarantees for each data set without assuming a trusted central server. We show that federated differential privacy is an intermediate privacy model between the well-established local and central models of differential privacy.
arXiv Detail & Related papers (2024-03-17T21:04:48Z)
FewFedPIT: Towards Privacy-preserving and Few-shot Federated Instruction Tuning [54.26614091429253]
Federated instruction tuning (FedIT) is a promising solution, by consolidating collaborative training across multiple data owners. FedIT encounters limitations such as scarcity of instructional data and risk of exposure to training data extraction attacks. We propose FewFedPIT, designed to simultaneously enhance privacy protection and model performance of federated few-shot learning.
arXiv Detail & Related papers (2024-03-10T08:41:22Z)
PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind) Our work offers a theoretical analysis for model design and benchmarks various techniques. In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z)
A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing. Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data. Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z)
SPEED: Secure, PrivatE, and Efficient Deep learning [2.283665431721732]
We introduce a deep learning framework able to deal with strong privacy constraints. Based on collaborative learning, differential privacy and homomorphic encryption, the proposed approach advances state-of-the-art.
arXiv Detail & Related papers (2020-06-16T19:31:52Z)
Decentralised Learning from Independent Multi-Domain Labels for Person Re-Identification [69.29602103582782]
Deep learning has been successful for many computer vision tasks due to the availability of shared and centralised large-scale training data. However, increasing awareness of privacy concerns poses new challenges to deep learning, especially for person re-identification (Re-ID) We propose a novel paradigm called Federated Person Re-Identification (FedReID) to construct a generalisable global model (a central server) by simultaneously learning with multiple privacy-preserved local models (local clients) This client-server collaborative learning process is iteratively performed under privacy control, enabling FedReID to realise decentralised learning without sharing distributed data nor collecting any
arXiv Detail & Related papers (2020-06-07T13:32:33Z)
Secure Sum Outperforms Homomorphic Encryption in (Current) Collaborative Deep Learning [7.690774882108066]
We discuss methods for training neural networks on the joint data of different data owners, that keep each party's input confidential. We show that a less complex and computationally less expensive secure sum protocol exhibits superior properties in terms of both collusion-resistance and runtime.
arXiv Detail & Related papers (2020-06-02T23:03:32Z)
Federating Recommendations Using Differentially Private Prototypes [16.29544153550663]
We propose a new federated approach to learning global and local private models for recommendation without collecting raw data. By requiring only two rounds of communication, we both reduce the communication costs and avoid the excessive privacy loss. We show local adaptation of the global model allows our method to outperform centralized matrix-factorization-based recommender system models.
arXiv Detail & Related papers (2020-03-01T22:21:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.