A Comprehensive Guide to Differential Privacy: From Theory to User Expectations
- URL: http://arxiv.org/abs/2509.03294v2
- Date: Thu, 11 Sep 2025 13:12:37 GMT
- Title: A Comprehensive Guide to Differential Privacy: From Theory to User Expectations
- Authors: Napsu Karmitsa, Antti Airola, Tapio Pahikkala, Tinja Pitkämäki,
- Abstract summary: Differential privacy (DP) has emerged as a principled, mathematically grounded framework for mitigating privacy risks.<n>This review provides a comprehensive survey of DP, covering its theoretical foundations, practical mechanisms, and real-world applications.
- Score: 0.769971486557519
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The increasing availability of personal data has enabled significant advances in fields such as machine learning, healthcare, and cybersecurity. However, this data abundance also raises serious privacy concerns, especially in light of powerful re-identification attacks and growing legal and ethical demands for responsible data use. Differential privacy (DP) has emerged as a principled, mathematically grounded framework for mitigating these risks. This review provides a comprehensive survey of DP, covering its theoretical foundations, practical mechanisms, and real-world applications. It explores key algorithmic tools and domain-specific challenges - particularly in privacy-preserving machine learning and synthetic data generation. The report also highlights usability issues and the need for improved communication and transparency in DP systems. Overall, the goal is to support informed adoption of DP by researchers and practitioners navigating the evolving landscape of data privacy.
Related papers
- How to DP-fy Your Data: A Practical Guide to Generating Synthetic Data With Differential Privacy [52.00934156883483]
Differential Privacy (DP) is a framework for reasoning about and limiting information leakage.<n>Differentially Private Synthetic data refers to synthetic data that preserves the overall trends of source data.
arXiv Detail & Related papers (2025-12-02T21:14:39Z) - Differential Privacy in Machine Learning: From Symbolic AI to LLMs [49.1574468325115]
Differential privacy provides a formal framework to mitigate privacy risks.<n>It ensures that the inclusion or exclusion of any single data point does not significantly alter the output of an algorithm.
arXiv Detail & Related papers (2025-06-13T11:30:35Z) - Optimal Allocation of Privacy Budget on Hierarchical Data Release [48.96399034594329]
This paper addresses the problem of optimal privacy budget allocation for hierarchical data release.<n>It aims to maximize data utility subject to a total privacy budget while considering the inherent trade-offs between data granularity and privacy loss.
arXiv Detail & Related papers (2025-05-16T05:25:11Z) - Privacy-Preserving Federated Embedding Learning for Localized Retrieval-Augmented Generation [60.81109086640437]
We propose a novel framework called Federated Retrieval-Augmented Generation (FedE4RAG)<n>FedE4RAG facilitates collaborative training of client-side RAG retrieval models.<n>We apply homomorphic encryption within federated learning to safeguard model parameters.
arXiv Detail & Related papers (2025-04-27T04:26:02Z) - Privacy-Preserving Customer Support: A Framework for Secure and Scalable Interactions [0.0]
This paper introduces the Privacy-Preserving Zero-Shot Learning (PP-ZSL) framework, a novel approach leveraging large language models (LLMs) in a zero-shot learning mode.<n>Unlike conventional machine learning methods, PP-ZSL eliminates the need for local training on sensitive data by utilizing pre-trained LLMs to generate responses directly.<n>The framework incorporates real-time data anonymization to redact or mask sensitive information, retrieval-augmented generation (RAG) for domain-specific query resolution, and robust post-processing to ensure compliance with regulatory standards.
arXiv Detail & Related papers (2024-12-10T17:20:47Z) - Data Collaboration Analysis with Orthonormal Basis Selection and Alignment [2.928964540437144]
Data Collaboration (DC) enables multiple parties to jointly train a model without exposing their private datasets.<n>Existing theory asserts that any target basis spanning the same subspace as the secret bases should suffice.<n>We introduce Orthonormal Data Collaboration (ODC), a novel DC framework that explicitly enforces orthonormality constraints on both the secret and target bases.
arXiv Detail & Related papers (2024-03-05T08:52:16Z) - State-of-the-Art Approaches to Enhancing Privacy Preservation of Machine Learning Datasets: A Survey [0.9208007322096533]
This paper examines the evolving landscape of machine learning (ML) and its profound impact across various sectors.<n>It focuses on the emerging field of Privacy-preserving Machine Learning (PPML)<n>As ML applications become increasingly integral to industries like telecommunications, financial technology, and surveillance, they raise significant privacy concerns.
arXiv Detail & Related papers (2024-02-25T17:31:06Z) - Local Privacy-preserving Mechanisms and Applications in Machine Learning [0.21268495173320798]
Local Differential Privacy (LDP) provides strong privacy protection for individual users during the stages of data collection and processing.
One of the major applications of the privacy-preserving mechanisms is machine learning.
arXiv Detail & Related papers (2024-01-08T22:29:00Z) - A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing.
Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data.
Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z) - Advancing Differential Privacy: Where We Are Now and Future Directions for Real-World Deployment [100.1798289103163]
We present a detailed review of current practices and state-of-the-art methodologies in the field of differential privacy (DP)
Key points and high-level contents of the article were originated from the discussions from "Differential Privacy (DP): Challenges Towards the Next Frontier"
This article aims to provide a reference point for the algorithmic and design decisions within the realm of privacy, highlighting important challenges and potential research directions.
arXiv Detail & Related papers (2023-04-14T05:29:18Z) - Distributed Machine Learning and the Semblance of Trust [66.1227776348216]
Federated Learning (FL) allows the data owner to maintain data governance and perform model training locally without having to share their data.
FL and related techniques are often described as privacy-preserving.
We explain why this term is not appropriate and outline the risks associated with over-reliance on protocols that were not designed with formal definitions of privacy in mind.
arXiv Detail & Related papers (2021-12-21T08:44:05Z) - Evaluating Privacy-Preserving Machine Learning in Critical
Infrastructures: A Case Study on Time-Series Classification [5.607917328636864]
It is pivotal to ensure that neither the model nor the data can be used to extract sensitive information.
Various safety-critical use cases (mostly relying on time-series data) are currently underrepresented in privacy-related considerations.
By evaluating several privacy-preserving methods regarding their applicability on time-series data, we validated the inefficacy of encryption for deep learning.
arXiv Detail & Related papers (2021-11-29T12:28:22Z) - Federated Extra-Trees with Privacy Preserving [20.564530457026976]
We propose a novel privacy-preserving machine learning model named Federated Extra-Trees.
A secure multi-institutional machine learning system was developed to provide superior performance.
arXiv Detail & Related papers (2020-02-18T01:15:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.