Differentially Private Simple Linear Regression
- URL: http://arxiv.org/abs/2007.05157v1
- Date: Fri, 10 Jul 2020 04:28:43 GMT
- Title: Differentially Private Simple Linear Regression
- Authors: Daniel Alabi, Audra McMillan, Jayshree Sarathy, Adam Smith and Salil
Vadhan
- Abstract summary: We study algorithms for simple linear regression that satisfy differential privacy.
We consider the design of differentially private algorithms for simple linear regression for small datasets.
We study the performance of a spectrum of algorithms we adapt to the setting.
- Score: 2.614403183902121
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Economics and social science research often require analyzing datasets of
sensitive personal information at fine granularity, with models fit to small
subsets of the data. Unfortunately, such fine-grained analysis can easily
reveal sensitive individual information. We study algorithms for simple linear
regression that satisfy differential privacy, a constraint which guarantees
that an algorithm's output reveals little about any individual input data
record, even to an attacker with arbitrary side information about the dataset.
We consider the design of differentially private algorithms for simple linear
regression for small datasets, with tens to hundreds of datapoints, which is a
particularly challenging regime for differential privacy. Focusing on a
particular application to small-area analysis in economics research, we study
the performance of a spectrum of algorithms we adapt to the setting. We
identify key factors that affect their performance, showing through a range of
experiments that algorithms based on robust estimators (in particular, the
Theil-Sen estimator) perform well on the smallest datasets, but that other more
standard algorithms do better as the dataset size increases.
Related papers
- Feature Selection from Differentially Private Correlations [35.187113265093615]
High-dimensional regression can leak information about individual datapoints in a dataset.
We employ a correlations-based order statistic to choose important features from a dataset and privatize them.
We find that our method significantly outperforms the established baseline for private feature selection on many datasets.
arXiv Detail & Related papers (2024-08-20T13:54:07Z) - Scaling Laws for the Value of Individual Data Points in Machine Learning [55.596413470429475]
We introduce a new perspective by investigating scaling behavior for the value of individual data points.
We provide learning theory to support our scaling law, and we observe empirically that it holds across diverse model classes.
Our work represents a first step towards understanding and utilizing scaling properties for the value of individual data points.
arXiv Detail & Related papers (2024-05-30T20:10:24Z) - Differentially Private Synthetic Data with Private Density Estimation [2.209921757303168]
We adopt the framework of differential privacy, and explore mechanisms for generating an entire dataset.
We build upon the work of Boedihardjo et al, which laid the foundations for a new optimization-based algorithm for generating private synthetic data.
arXiv Detail & Related papers (2024-05-06T14:06:12Z) - Privacy-Optimized Randomized Response for Sharing Multi-Attribute Data [1.1510009152620668]
We propose a privacy-optimized randomized response that guarantees the strongest privacy in sharing multi-attribute data.
We also present an efficient algorithm for constructing a near-optimal attribute mechanism.
Our methods provide significantly stronger privacy guarantees for the entire dataset than the existing method.
arXiv Detail & Related papers (2024-02-12T11:34:42Z) - Differentially Private Sliced Inverse Regression: Minimax Optimality and
Algorithm [16.14032140601778]
We propose optimally differentially private algorithms designed to address privacy concerns in the context of sufficient dimension reduction.
We develop differentially private algorithms that achieve the minimax lower bounds up to logarithmic factors.
As a natural extension, we can readily offer analogous lower and upper bounds for differentially private sparse principal component analysis.
arXiv Detail & Related papers (2024-01-16T06:47:43Z) - On Differential Privacy and Adaptive Data Analysis with Bounded Space [76.10334958368618]
We study the space complexity of the two related fields of differential privacy and adaptive data analysis.
We show that there exists a problem P that requires exponentially more space to be solved efficiently with differential privacy.
The line of work on adaptive data analysis focuses on understanding the number of samples needed for answering a sequence of adaptive queries.
arXiv Detail & Related papers (2023-02-11T14:45:31Z) - Private Set Generation with Discriminative Information [63.851085173614]
Differentially private data generation is a promising solution to the data privacy challenge.
Existing private generative models are struggling with the utility of synthetic samples.
We introduce a simple yet effective method that greatly improves the sample utility of state-of-the-art approaches.
arXiv Detail & Related papers (2022-11-07T10:02:55Z) - Learning-Augmented Private Algorithms for Multiple Quantile Release [27.58033173923427]
We propose to use the learning-augmented algorithms (or algorithms with predictions) framework as a powerful way of designing and analyzing privacy-preserving methods.
We derive error guarantees that scale with a natural measure of prediction quality while (almost) recovering state-of-the-art prediction-independent guarantees.
arXiv Detail & Related papers (2022-10-20T12:59:00Z) - Private Domain Adaptation from a Public Source [48.83724068578305]
We design differentially private discrepancy-based algorithms for adaptation from a source domain with public labeled data to a target domain with unlabeled private data.
Our solutions are based on private variants of Frank-Wolfe and Mirror-Descent algorithms.
arXiv Detail & Related papers (2022-08-12T06:52:55Z) - Selecting the suitable resampling strategy for imbalanced data
classification regarding dataset properties [62.997667081978825]
In many application domains such as medicine, information retrieval, cybersecurity, social media, etc., datasets used for inducing classification models often have an unequal distribution of the instances of each class.
This situation, known as imbalanced data classification, causes low predictive performance for the minority class examples.
Oversampling and undersampling techniques are well-known strategies to deal with this problem by balancing the number of examples of each class.
arXiv Detail & Related papers (2021-12-15T18:56:39Z) - On Deep Learning with Label Differential Privacy [54.45348348861426]
We study the multi-class classification setting where the labels are considered sensitive and ought to be protected.
We propose a new algorithm for training deep neural networks with label differential privacy, and run evaluations on several datasets.
arXiv Detail & Related papers (2021-02-11T15:09:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.