Leveraging Public Data for Practical Private Query Release
- URL: http://arxiv.org/abs/2102.08598v1
- Date: Wed, 17 Feb 2021 06:19:34 GMT
- Title: Leveraging Public Data for Practical Private Query Release
- Authors: Terrance Liu, Giuseppe Vietri, Thomas Steinke, Jonathan Ullman, Zhiwei
Steven Wu
- Abstract summary: We present PMWPub, which -- unlike existing baselines -- leverages public data drawn from a related distribution as prior information.
We provide a theoretical analysis and an empirical evaluation on the American Community Survey (ACS) and ADULT datasets.
PMWPub scales well to high-dimensional data domains, where running many existing methods would be computationally infeasible.
- Score: 24.615338449313676
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In many statistical problems, incorporating priors can significantly improve
performance. However, the use of prior knowledge in differentially private
query release has remained underexplored, despite such priors commonly being
available in the form of public datasets, such as previous US Census releases.
With the goal of releasing statistics about a private dataset, we present
PMW^Pub, which -- unlike existing baselines -- leverages public data drawn from
a related distribution as prior information. We provide a theoretical analysis
and an empirical evaluation on the American Community Survey (ACS) and ADULT
datasets, which shows that our method outperforms state-of-the-art methods.
Furthermore, PMW^Pub scales well to high-dimensional data domains, where
running many existing methods would be computationally infeasible.
Related papers
- An applied Perspective: Estimating the Differential Identifiability Risk of an Exemplary SOEP Data Set [2.66269503676104]
We show how to compute the risk metric efficiently for a set of basic statistical queries.
Our empirical analysis based on an extensive, real-world scientific data set expands the knowledge on how to compute risks under realistic conditions.
arXiv Detail & Related papers (2024-07-04T17:50:55Z) - A Comprehensive Survey on Data Augmentation [55.355273602421384]
Data augmentation is a technique that generates high-quality artificial data by manipulating existing data samples.
Existing literature surveys only focus on a certain type of specific modality data.
We propose a more enlightening taxonomy that encompasses data augmentation techniques for different common data modalities.
arXiv Detail & Related papers (2024-05-15T11:58:08Z) - Synthetic Census Data Generation via Multidimensional Multiset Sum [7.900694093691988]
We provide tools to generate synthetic microdata solely from published Census statistics.
We show that our methods work well in practice, and we offer theoretical arguments to explain our performance.
arXiv Detail & Related papers (2024-04-15T19:06:37Z) - Query of CC: Unearthing Large Scale Domain-Specific Knowledge from
Public Corpora [104.16648246740543]
We propose an efficient data collection method based on large language models.
The method bootstraps seed information through a large language model and retrieves related data from public corpora.
It not only collects knowledge-related data for specific domains but unearths the data with potential reasoning procedures.
arXiv Detail & Related papers (2024-01-26T03:38:23Z) - The Impact of Differential Feature Under-reporting on Algorithmic Fairness [86.275300739926]
We present an analytically tractable model of differential feature under-reporting.
We then use to characterize the impact of this kind of data bias on algorithmic fairness.
Our results show that, in real world data settings, under-reporting typically leads to increasing disparities.
arXiv Detail & Related papers (2024-01-16T19:16:22Z) - Optimal Locally Private Nonparametric Classification with Public Data [2.631955426232593]
We investigate the problem of public data assisted non-interactive Local Differentially Private (LDP) learning with a focus on non-parametric classification.
Under the posterior drift assumption, we derive the mini-max optimal convergence rate with LDP constraint.
We present a novel approach, the locally differentially private classification tree, which attains the mini-max optimal convergence rate.
arXiv Detail & Related papers (2023-11-19T16:35:01Z) - A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing.
Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data.
Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z) - Position: Considerations for Differentially Private Learning with Large-Scale Public Pretraining [75.25943383604266]
We question whether the use of large Web-scraped datasets should be viewed as differential-privacy-preserving.
We caution that publicizing these models pretrained on Web data as "private" could lead to harm and erode the public's trust in differential privacy as a meaningful definition of privacy.
We conclude by discussing potential paths forward for the field of private learning, as public pretraining becomes more popular and powerful.
arXiv Detail & Related papers (2022-12-13T10:41:12Z) - On PAC Learning Halfspaces in Non-interactive Local Privacy Model with
Public Unlabeled Data [18.820311737806456]
We study the problem of PAC learning halfspaces in the non-interactive local differential model (NLDP)
We show that it is possible to achieve sample complexities that are only linear in the dimension and in other terms for both private and public data.
arXiv Detail & Related papers (2022-09-17T12:19:20Z) - Post-processing of Differentially Private Data: A Fairness Perspective [53.29035917495491]
This paper shows that post-processing causes disparate impacts on individuals or groups.
It analyzes two critical settings: the release of differentially private datasets and the use of such private datasets for downstream decisions.
It proposes a novel post-processing mechanism that is (approximately) optimal under different fairness metrics.
arXiv Detail & Related papers (2022-01-24T02:45:03Z) - Differentially Private Normalizing Flows for Privacy-Preserving Density
Estimation [10.561489862855334]
We propose the use of normalizing flow models that provide explicit differential privacy guarantees.
We show how our algorithm can be applied to the task of differentially private anomaly detection.
arXiv Detail & Related papers (2021-03-25T18:39:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.