Privacy-Enhanced Database Synthesis for Benchmark Publishing
- URL: http://arxiv.org/abs/2405.01312v1
- Date: Thu, 2 May 2024 14:20:24 GMT
- Title: Privacy-Enhanced Database Synthesis for Benchmark Publishing
- Authors: Yongrui Zhong, Yunqing Ge, Jianbin Qin, Shuyuan Zheng, Bo Tang, Yu-Xuan Qiu, Rui Mao, Ye Yuan, Makoto Onizuka, Chuan Xiao,
- Abstract summary: Differential privacy has become a key method for safeguarding privacy when sharing data, but the focus has largely been on minimizing errors in aggregate queries or classification tasks.
This paper delves into the creation of privacy-preserving databases specifically for benchmarking, aiming to produce a differentially private database.
PrivBench uses sum-product networks (SPNs) to partition and sample data, enhancing data representation while securing privacy.
- Score: 16.807486872855534
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Benchmarking is crucial for evaluating a DBMS, yet existing benchmarks often fail to reflect the varied nature of user workloads. As a result, there is increasing momentum toward creating databases that incorporate real-world user data to more accurately mirror business environments. However, privacy concerns deter users from directly sharing their data, underscoring the importance of creating synthesized databases for benchmarking that also prioritize privacy protection. Differential privacy has become a key method for safeguarding privacy when sharing data, but the focus has largely been on minimizing errors in aggregate queries or classification tasks, with less attention given to benchmarking factors like runtime performance. This paper delves into the creation of privacy-preserving databases specifically for benchmarking, aiming to produce a differentially private database whose query performance closely resembles that of the original data. Introducing PrivBench, an innovative synthesis framework, we support the generation of high-quality data that maintains privacy. PrivBench uses sum-product networks (SPNs) to partition and sample data, enhancing data representation while securing privacy. The framework allows users to adjust the detail of SPN partitions and privacy settings, crucial for customizing privacy levels. We validate our approach, which uses the Laplace and exponential mechanisms, in maintaining privacy. Our tests show that PrivBench effectively generates data that maintains privacy and excels in query performance, consistently reducing errors in query execution time, query cardinality, and KL divergence.
Related papers
- Enhancing Feature-Specific Data Protection via Bayesian Coordinate Differential Privacy [55.357715095623554]
Local Differential Privacy (LDP) offers strong privacy guarantees without requiring users to trust external parties.
We propose a Bayesian framework, Bayesian Coordinate Differential Privacy (BCDP), that enables feature-specific privacy quantification.
arXiv Detail & Related papers (2024-10-24T03:39:55Z) - Privacy-Preserving Data Management using Blockchains [0.0]
Data providers need to control and update existing privacy preferences due to changing data usage.
This paper proposes a blockchain-based methodology for preserving data providers private and sensitive data.
arXiv Detail & Related papers (2024-08-21T01:10:39Z) - Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning [62.224804688233]
differential privacy (DP) offers a promising solution by ensuring models are 'almost indistinguishable' with or without any particular privacy unit.
We study user-level DP motivated by applications where it necessary to ensure uniform privacy protection across users.
arXiv Detail & Related papers (2024-06-20T13:54:32Z) - Unified Mechanism-Specific Amplification by Subsampling and Group Privacy Amplification [54.1447806347273]
Amplification by subsampling is one of the main primitives in machine learning with differential privacy.
We propose the first general framework for deriving mechanism-specific guarantees.
We analyze how subsampling affects the privacy of groups of multiple users.
arXiv Detail & Related papers (2024-03-07T19:36:05Z) - PrivLM-Bench: A Multi-level Privacy Evaluation Benchmark for Language Models [42.20437015301152]
We present PrivLM-Bench, a benchmark for evaluating the privacy leakage of language models (LMs)
Instead of only reporting DP parameters, PrivLM-Bench sheds light on the neglected inference data privacy during actual usage.
We conduct extensive experiments on three datasets of GLUE for mainstream LMs.
arXiv Detail & Related papers (2023-11-07T14:55:52Z) - Privacy Preserving Large Language Models: ChatGPT Case Study Based Vision and Framework [6.828884629694705]
This article proposes the conceptual model called PrivChatGPT, a privacy-generative model for LLMs.
PrivChatGPT consists of two main components i.e., preserving user privacy during the data curation/pre-processing together with preserving private context and the private training process for large-scale data.
arXiv Detail & Related papers (2023-10-19T06:55:13Z) - A Randomized Approach for Tight Privacy Accounting [63.67296945525791]
We propose a new differential privacy paradigm called estimate-verify-release (EVR)
EVR paradigm first estimates the privacy parameter of a mechanism, then verifies whether it meets this guarantee, and finally releases the query output.
Our empirical evaluation shows the newly proposed EVR paradigm improves the utility-privacy tradeoff for privacy-preserving machine learning.
arXiv Detail & Related papers (2023-04-17T00:38:01Z) - How Do Input Attributes Impact the Privacy Loss in Differential Privacy? [55.492422758737575]
We study the connection between the per-subject norm in DP neural networks and individual privacy loss.
We introduce a novel metric termed the Privacy Loss-Input Susceptibility (PLIS) which allows one to apportion the subject's privacy loss to their input attributes.
arXiv Detail & Related papers (2022-11-18T11:39:03Z) - Algorithms with More Granular Differential Privacy Guarantees [65.3684804101664]
We consider partial differential privacy (DP), which allows quantifying the privacy guarantee on a per-attribute basis.
In this work, we study several basic data analysis and learning tasks, and design algorithms whose per-attribute privacy parameter is smaller that the best possible privacy parameter for the entire record of a person.
arXiv Detail & Related papers (2022-09-08T22:43:50Z) - Reasoning over Public and Private Data in Retrieval-Based Systems [29.515915401413334]
State-of-the-art systems explicitly retrieve relevant information to a user question from a background corpus before producing an answer.
While today's retrieval systems assume the corpus is fully accessible, users are often unable or unwilling to expose their private data to entities hosting public data.
We first define the PUBLIC-PRIVATE AUTOREGRESSIVE Information RETRIEVAL (PAIR) privacy framework for the novel retrieval setting over multiple privacy scopes.
arXiv Detail & Related papers (2022-03-14T13:08:51Z) - Privately Publishable Per-instance Privacy [21.775752827149383]
We consider how to privately share the personalized privacy losses incurred by objective perturbation, using per-instance differential privacy (pDP)
We analyze the per-instance privacy loss of releasing a private empirical risk minimizer learned via objective perturbation, and propose a group of methods to privately and accurately publish the pDP losses at little to no additional privacy cost.
arXiv Detail & Related papers (2021-11-03T15:17:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.