Information Leakage in Data Linkage
- URL: http://arxiv.org/abs/2505.08596v1
- Date: Tue, 13 May 2025 14:09:47 GMT
- Title: Information Leakage in Data Linkage
- Authors: Peter Christen, Rainer Schnell, Anushka Vidanage,
- Abstract summary: We show that PPRL protocols can still result in the unintentional leakage of sensitive information.<n>We provide recommendations to help data custodians and other parties involved in a data linkage project to identify and prevent vulnerabilities.
- Score: 2.4824510650418308
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The process of linking databases that contain sensitive information about individuals across organisations is an increasingly common requirement in the health and social science research domains, as well as with governments and businesses. To protect personal data, protocols have been developed to limit the leakage of sensitive information. Furthermore, privacy-preserving record linkage (PPRL) techniques have been proposed to conduct linkage on encoded data. While PPRL techniques are now being employed in real-world applications, the focus of PPRL research has been on the technical aspects of linking sensitive data (such as encoding methods and cryptanalysis attacks), but not on organisational challenges when employing such techniques in practice. We analyse what sensitive information can possibly leak, either unintentionally or intentionally, in traditional data linkage as well as PPRL protocols, and what a party that participates in such a protocol can learn from the data it obtains legitimately within the protocol. We also show that PPRL protocols can still result in the unintentional leakage of sensitive information. We provide recommendations to help data custodians and other parties involved in a data linkage project to identify and prevent vulnerabilities and make their project more secure.
Related papers
- Labeled Delegated PSI and its Applications in the Public Sector [16.0115432790131]
(Multi-party) delegated private set intersection (D-PSI) is a privacy-enhancing technology to link data across multiple data providers using a data collector.<n>This paper is the result of a collaboration with a governmental organization responsible for collecting, linking, and pseudonymizing data.<n>Based on their requirements, we design a new D-PSI protocol with composable output functions, including encrypted payload and pseudonymized identifiers.
arXiv Detail & Related papers (2025-12-09T12:59:00Z) - How to DP-fy Your Data: A Practical Guide to Generating Synthetic Data With Differential Privacy [52.00934156883483]
Differential Privacy (DP) is a framework for reasoning about and limiting information leakage.<n>Differentially Private Synthetic data refers to synthetic data that preserves the overall trends of source data.
arXiv Detail & Related papers (2025-12-02T21:14:39Z) - A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage [77.83757117924995]
We propose a new framework that evaluates re-identification attacks to quantify individual privacy risks upon data release.<n>Our approach shows that seemingly innocuous auxiliary information can be used to infer sensitive attributes like age or substance use history from sanitized data.
arXiv Detail & Related papers (2025-04-28T01:16:27Z) - Poisoning Attacks to Local Differential Privacy Protocols for Trajectory Data [14.934626547047763]
Trajectory data, which tracks movements through geographic locations, is crucial for improving real-world applications.<n>Local differential privacy (LDP) offers a solution by allowing individuals to locally perturb their trajectory data before sharing it.<n>Despite its privacy benefits, LDP protocols are vulnerable to data poisoning attacks, where attackers inject fake data to manipulate aggregated results.
arXiv Detail & Related papers (2025-03-06T02:31:45Z) - Data Poisoning Attacks to Locally Differentially Private Range Query Protocols [15.664794320925562]
Local Differential Privacy (LDP) has been widely adopted to protect user privacy in decentralized data collection.<n>Recent studies have revealed that LDP protocols are vulnerable to data poisoning attacks.<n>We present the first study on data poisoning attacks targeting LDP range query protocols.
arXiv Detail & Related papers (2025-03-05T12:40:34Z) - On the Robustness of LDP Protocols for Numerical Attributes under Data Poisoning Attacks [17.351593328097977]
Local differential privacy (LDP) protocols are vulnerable to data poisoning attacks.<n>This vulnerability raises concerns regarding the robustness and reliability of LDP in hostile environments.
arXiv Detail & Related papers (2024-03-28T15:43:38Z) - VPAS: Publicly Verifiable and Privacy-Preserving Aggregate Statistics on Distributed Datasets [4.181095166452762]
We explore the challenge of input validation and public verifiability within privacy-preserving aggregation protocols.
We propose the "VPAS" protocol, which satisfies these requirements.
Our findings indicate that the overhead associated with verifiability in our protocol is 10x lower than that incurred by simply using conventional zkSNARKs.
arXiv Detail & Related papers (2024-03-22T13:50:22Z) - A Survey and Comparative Analysis of Security Properties of CAN Authentication Protocols [92.81385447582882]
The Controller Area Network (CAN) bus leaves in-vehicle communications inherently non-secure.
This paper reviews and compares the 15 most prominent authentication protocols for the CAN bus.
We evaluate protocols based on essential operational criteria that contribute to ease of implementation.
arXiv Detail & Related papers (2024-01-19T14:52:04Z) - Is Vertical Logistic Regression Privacy-Preserving? A Comprehensive
Privacy Analysis and Beyond [57.10914865054868]
We consider vertical logistic regression (VLR) trained with mini-batch descent gradient.
We provide a comprehensive and rigorous privacy analysis of VLR in a class of open-source Federated Learning frameworks.
arXiv Detail & Related papers (2022-07-19T05:47:30Z) - Distributed Machine Learning and the Semblance of Trust [66.1227776348216]
Federated Learning (FL) allows the data owner to maintain data governance and perform model training locally without having to share their data.
FL and related techniques are often described as privacy-preserving.
We explain why this term is not appropriate and outline the risks associated with over-reliance on protocols that were not designed with formal definitions of privacy in mind.
arXiv Detail & Related papers (2021-12-21T08:44:05Z) - Reinforcement Learning on Encrypted Data [58.39270571778521]
We present a preliminary, experimental study of how a DQN agent trained on encrypted states performs in environments with discrete and continuous state spaces.
Our results highlight that the agent is still capable of learning in small state spaces even in presence of non-deterministic encryption, but performance collapses in more complex environments.
arXiv Detail & Related papers (2021-09-16T21:59:37Z) - Privacy and Data Balkanization: Circumventing the Barriers [0.0]
Privacy concerns and laws are leading to significant overhead in arranging for sharing or combining different data sets.
For new applications, where the benefit of combined data is not yet clear, this overhead can inhibit organizations from even trying to determine whether they can mutually benefit from sharing their data.
We discuss techniques to overcome this difficulty by employing private information transfer to determine whether there is a benefit from sharing data, and whether there is room to negotiate acceptable prices.
arXiv Detail & Related papers (2020-10-07T22:05:28Z) - BeeTrace: A Unified Platform for Secure Contact Tracing that Breaks Data
Silos [73.84437456144994]
Contact tracing is an important method to control the spread of an infectious disease such as COVID-19.
Current solutions do not utilize the huge volume of data stored in business databases and individual digital devices.
We propose BeeTrace, a unified platform that breaks data silos and deploys state-of-the-art cryptographic protocols to guarantee privacy goals.
arXiv Detail & Related papers (2020-07-05T10:33:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.