Related papers: Differentially Private Linear Regression with Linked Data

Differentially Private Linear Regression with Linked Data

URL: http://arxiv.org/abs/2308.00836v2
Date: Wed, 8 May 2024 01:36:39 GMT
Title: Differentially Private Linear Regression with Linked Data
Authors: Shurong Lin, Elliot Paquette, Eric D. Kolaczyk,
Abstract summary: Differential privacy, a mathematical notion from computer science, is a rising tool offering robust privacy guarantees. Recent work focuses on developing differentially private versions of individual statistical and machine learning tasks. We present two differentially private algorithms for linear regression with linked data.
Score: 3.9325957466009203
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: There has been increasing demand for establishing privacy-preserving methodologies for modern statistics and machine learning. Differential privacy, a mathematical notion from computer science, is a rising tool offering robust privacy guarantees. Recent work focuses primarily on developing differentially private versions of individual statistical and machine learning tasks, with nontrivial upstream pre-processing typically not incorporated. An important example is when record linkage is done prior to downstream modeling. Record linkage refers to the statistical task of linking two or more data sets of the same group of entities without a unique identifier. This probabilistic procedure brings additional uncertainty to the subsequent task. In this paper, we present two differentially private algorithms for linear regression with linked data. In particular, we propose a noisy gradient method and a sufficient statistics perturbation approach for the estimation of regression coefficients. We investigate the privacy-accuracy tradeoff by providing finite-sample error bounds for the estimators, which allows us to understand the relative contributions of linkage error, estimation error, and the cost of privacy. The variances of the estimators are also discussed. We demonstrate the performance of the proposed algorithms through simulations and an application to synthetic data.

Related papers

High-Dimensional Differentially Private Quantile Regression: Distributed Estimation and Statistical Inference [0.26784722398800515]
We propose a differentially private quantile regression method for high-dimensional data in a distributed setting.<n>We develop a differentially private estimation algorithm with iterative updates, ensuring near-optimal statistical accuracy and formal privacy guarantees.
arXiv Detail & Related papers (2025-08-07T09:47:44Z)
Benchmarking Fraud Detectors on Private Graph Data [70.4654745317714]
Currently, many types of fraud are managed in part by automated detection algorithms that operate over graphs.<n>We consider the scenario where a data holder wishes to outsource development of fraud detectors to third parties.<n>Third parties submit their fraud detectors to the data holder, who evaluates these algorithms on a private dataset and then publicly communicates the results.<n>We propose a realistic privacy attack on this system that allows an adversary to de-anonymize individuals' data based only on the evaluation results.
arXiv Detail & Related papers (2025-07-30T03:20:15Z)
Linear-Time User-Level DP-SCO via Robust Statistics [55.350093142673316]
User-level differentially private convex optimization (DP-SCO) has garnered significant attention due to the importance of safeguarding user privacy in machine learning applications. Current methods, such as those based on differentially private gradient descent (DP-SGD), often struggle with high noise accumulation and suboptimal utility. We introduce a novel linear-time algorithm that leverages robust statistics, specifically the median and trimmed mean, to overcome these challenges.
arXiv Detail & Related papers (2025-02-13T02:05:45Z)
Differentially Private Random Feature Model [52.468511541184895]
We produce a differentially private random feature model for privacy-preserving kernel machines. We show that our method preserves privacy and derive a generalization error bound for the method.
arXiv Detail & Related papers (2024-12-06T05:31:08Z)
Pseudo-Probability Unlearning: Towards Efficient and Privacy-Preserving Machine Unlearning [59.29849532966454]
We propose PseudoProbability Unlearning (PPU), a novel method that enables models to forget data to adhere to privacy-preserving manner. Our method achieves over 20% improvements in forgetting error compared to the state-of-the-art.
arXiv Detail & Related papers (2024-11-04T21:27:06Z)
Initialization Matters: Privacy-Utility Analysis of Overparameterized Neural Networks [72.51255282371805]
We prove a privacy bound for the KL divergence between model distributions on worst-case neighboring datasets. We find that this KL privacy bound is largely determined by the expected squared gradient norm relative to model parameters during training.
arXiv Detail & Related papers (2023-10-31T16:13:22Z)
Conditional Density Estimations from Privacy-Protected Data [0.0]
We propose simulation-based inference methods from privacy-protected datasets. We illustrate our methods on discrete time-series data under an infectious disease model and with ordinary linear regression models.
arXiv Detail & Related papers (2023-10-19T14:34:17Z)
Approximate, Adapt, Anonymize (3A): a Framework for Privacy Preserving Training Data Release for Machine Learning [3.29354893777827]
We introduce a data release framework, 3A (Approximate, Adapt, Anonymize), to maximize data utility for machine learning. We present experimental evidence showing minimal discrepancy between performance metrics of models trained on real versus privatized datasets.
arXiv Detail & Related papers (2023-07-04T18:37:11Z)
Generating Private Synthetic Data with Genetic Algorithms [29.756119782419955]
We study the problem of efficiently generating differentially private synthetic data that approximates the statistical properties of an underlying sensitive dataset. We propose Private-GSD, a private genetic algorithm based on zeroth-order optimizations that do not require modifying the original objective. We show that Private-GSD outperforms the state-of-the-art methods on non-differential queries while matching accuracy in approximating differentiable ones.
arXiv Detail & Related papers (2023-06-05T21:19:37Z)
Simulation-based, Finite-sample Inference for Privatized Data [14.218697973204065]
We propose a simulation-based "repro sample" approach to produce statistically valid confidence intervals and hypothesis tests. We show that this methodology is applicable to a wide variety of private inference problems.
arXiv Detail & Related papers (2023-03-09T15:19:31Z)
Privacy Induces Robustness: Information-Computation Gaps and Sparse Mean Estimation [8.9598796481325]
We investigate the consequences of this observation for both algorithms and computational complexity across different statistical problems. We establish an information-computation gap for private sparse mean estimation. We also give evidence for privacy-induced information-computation gaps for several other statistics and learning problems.
arXiv Detail & Related papers (2022-11-01T20:03:41Z)
Sensitivity analysis in differentially private machine learning using hybrid automatic differentiation [54.88777449903538]
We introduce a novel textithybrid automatic differentiation (AD) system for sensitivity analysis. This enables modelling the sensitivity of arbitrary differentiable function compositions, such as the training of neural networks on private data. Our approach can enable the principled reasoning about privacy loss in the setting of data processing.
arXiv Detail & Related papers (2021-07-09T07:19:23Z)
Graph Embedding with Data Uncertainty [113.39838145450007]
spectral-based subspace learning is a common data preprocessing step in many machine learning pipelines. Most subspace learning methods do not take into consideration possible measurement inaccuracies or artifacts that can lead to data with high uncertainty.
arXiv Detail & Related papers (2020-09-01T15:08:23Z)
Learning while Respecting Privacy and Robustness to Distributional Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model. The objective is to endow the trained model with robustness against adversarially manipulated input data. Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z)
Differentially Private Federated Learning with Laplacian Smoothing [72.85272874099644]
Federated learning aims to protect data privacy by collaboratively learning a model without sharing private data among users. An adversary may still be able to infer the private training data by attacking the released model. Differential privacy provides a statistical protection against such attacks at the price of significantly degrading the accuracy or utility of the trained models.
arXiv Detail & Related papers (2020-05-01T04:28:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.