Related papers: Data Analytics with Differential Privacy

Data Analytics with Differential Privacy

URL: http://arxiv.org/abs/2311.16104v1
Date: Thu, 20 Jul 2023 17:43:29 GMT
Title: Data Analytics with Differential Privacy
Authors: Vassilis Digalakis Jr
Abstract summary: We develop differentially private algorithms to analyze distributed and streaming data. In the distributed model, we consider the particular problem of learning -- in a distributed fashion -- a global model of the data. We offer one of the strongest privacy guarantees for the streaming model, user-level pan-privacy.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Differential privacy is the state-of-the-art definition for privacy, guaranteeing that any analysis performed on a sensitive dataset leaks no information about the individuals whose data are contained therein. In this thesis, we develop differentially private algorithms to analyze distributed and streaming data. In the distributed model, we consider the particular problem of learning -- in a distributed fashion -- a global model of the data, that can subsequently be used for arbitrary analyses. We build upon PrivBayes, a differentially private method that approximates the high-dimensional distribution of a centralized dataset as a product of low-order distributions, utilizing a Bayesian Network model. We examine three novel approaches to learning a global Bayesian Network from distributed data, while offering the differential privacy guarantee to all local datasets. Our work includes a detailed theoretical analysis of the distributed, differentially private entropy estimator which we use in one of our algorithms, as well as a detailed experimental evaluation, using both synthetic and real-world data. In the streaming model, we focus on the problem of estimating the density of a stream of users, which expresses the fraction of all users that actually appear in the stream. We offer one of the strongest privacy guarantees for the streaming model, user-level pan-privacy, which ensures that the privacy of any user is protected, even against an adversary that observes the internal state of the algorithm. We provide a detailed analysis of an existing, sampling-based algorithm for the problem and propose two novel modifications that significantly improve it, both theoretically and experimentally, by optimally using all the allocated "privacy budget."

Related papers

PDSL: Privacy-Preserved Decentralized Stochastic Learning with Heterogeneous Data Distribution [8.055697728504326]
In decentralized learning, a group of agents collaborates to learn a global model using distributed datasets without a central server. In this paper, we propose P, a novel privacy-preserved decentralized learning algorithm with heterogeneous data distribution.
arXiv Detail & Related papers (2025-03-31T04:58:05Z)
Differentially Private Random Feature Model [52.468511541184895]
We produce a differentially private random feature model for privacy-preserving kernel machines. We show that our method preserves privacy and derive a generalization error bound for the method.
arXiv Detail & Related papers (2024-12-06T05:31:08Z)
Differentially Private Synthetic Data with Private Density Estimation [2.209921757303168]
We adopt the framework of differential privacy, and explore mechanisms for generating an entire dataset. We build upon the work of Boedihardjo et al, which laid the foundations for a new optimization-based algorithm for generating private synthetic data.
arXiv Detail & Related papers (2024-05-06T14:06:12Z)
Federated Transfer Learning with Differential Privacy [21.50525027559563]
Federated learning has emerged as a powerful framework for analysing distributed data. In this paper, we aim to enhance learning on a target data set by leveraging information from multiple heterogeneous source data sets. We rigorously formulate the notion of federated differential privacy, which offers privacy guarantees for each data set.
arXiv Detail & Related papers (2024-03-17T21:04:48Z)
Initialization Matters: Privacy-Utility Analysis of Overparameterized Neural Networks [72.51255282371805]
We prove a privacy bound for the KL divergence between model distributions on worst-case neighboring datasets. We find that this KL privacy bound is largely determined by the expected squared gradient norm relative to model parameters during training.
arXiv Detail & Related papers (2023-10-31T16:13:22Z)
Conditional Density Estimations from Privacy-Protected Data [0.0]
We propose simulation-based inference methods from privacy-protected datasets. We illustrate our methods on discrete time-series data under an infectious disease model and with ordinary linear regression models.
arXiv Detail & Related papers (2023-10-19T14:34:17Z)
A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing. Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data. Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z)
Private Set Generation with Discriminative Information [63.851085173614]
Differentially private data generation is a promising solution to the data privacy challenge. Existing private generative models are struggling with the utility of synthetic samples. We introduce a simple yet effective method that greatly improves the sample utility of state-of-the-art approaches.
arXiv Detail & Related papers (2022-11-07T10:02:55Z)
Differentially private partitioned variational inference [28.96767727430277]
Learning a privacy-preserving model from sensitive data which are distributed across multiple devices is an increasingly important problem. We present differentially private partitioned variational inference, the first general framework for learning a variational approximation to a Bayesian posterior distribution.
arXiv Detail & Related papers (2022-09-23T13:58:40Z)
Private Domain Adaptation from a Public Source [48.83724068578305]
We design differentially private discrepancy-based algorithms for adaptation from a source domain with public labeled data to a target domain with unlabeled private data. Our solutions are based on private variants of Frank-Wolfe and Mirror-Descent algorithms.
arXiv Detail & Related papers (2022-08-12T06:52:55Z)
Smooth Anonymity for Sparse Graphs [69.1048938123063]
differential privacy has emerged as the gold standard of privacy, however, when it comes to sharing sparse datasets. In this work, we consider a variation of $k$-anonymity, which we call smooth-$k$-anonymity, and design simple large-scale algorithms that efficiently provide smooth-$k$-anonymity.
arXiv Detail & Related papers (2022-07-13T17:09:25Z)
Personalization Improves Privacy-Accuracy Tradeoffs in Federated Optimization [57.98426940386627]
We show that coordinating local learning with private centralized learning yields a generically useful and improved tradeoff between accuracy and privacy. We illustrate our theoretical results with experiments on synthetic and real-world datasets.
arXiv Detail & Related papers (2022-02-10T20:44:44Z)
Differentially Private Normalizing Flows for Privacy-Preserving Density Estimation [10.561489862855334]
We propose the use of normalizing flow models that provide explicit differential privacy guarantees. We show how our algorithm can be applied to the task of differentially private anomaly detection.
arXiv Detail & Related papers (2021-03-25T18:39:51Z)
Secure and Differentially Private Bayesian Learning on Distributed Data [17.098036331529784]
We present a distributed Bayesian learning approach via Preconditioned Langevin Dynamics with RMSprop, which combines differential privacy and homomorphic encryption in a manner while protecting private information. We applied the proposed secure and privacy-preserving distributed Bayesian learning approach to logistic regression and survival analysis on distributed data, and demonstrated its feasibility in terms of prediction accuracy and time complexity, compared to the centralized approach.
arXiv Detail & Related papers (2020-05-22T05:13:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.