Statistics-Friendly Confidentiality Protection for Establishment Data, with Applications to the QCEW
- URL: http://arxiv.org/abs/2509.01597v1
- Date: Mon, 01 Sep 2025 16:29:54 GMT
- Title: Statistics-Friendly Confidentiality Protection for Establishment Data, with Applications to the QCEW
- Authors: Kaitlyn Webb, Prottay Protivash, John Durrell, Daniell Toth, Aleksandra Slavković, Daniel Kifer,
- Abstract summary: We propose a novel confidentiality framework for business data with a focus on interpretability for policy makers.<n>We analyze new challenges that arise when noisy query answers are converted into confidentiality-preserving microdata.
- Score: 39.69299537637253
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Confidentiality for business data is an understudied area of disclosure avoidance, where legacy methods struggle to provide acceptable results. Modern formal privacy techniques designed for person-level data do not provide suitable confidentiality/utility trade-offs due to the highly skewed nature of business data and because extreme outlier records are often important contributors to query answers. In this paper, inspired by Gaussian Differential Privacy, we propose a novel confidentiality framework for business data with a focus on interpretability for policy makers. We propose two query-answering mechanisms and analyze new challenges that arise when noisy query answers are converted into confidentiality-preserving microdata. We evaluate our mechanisms on confidential Quarterly Census of Employment and Wages (QCEW) microdata and a public substitute dataset.
Related papers
- SynQP: A Framework and Metrics for Evaluating the Quality and Privacy Risk of Synthetic Data [4.73374389278596]
We introduce SynQP, an open framework for privacy benchmarking in synthetic data generation.<n>We also highlight the need for privacy metrics that fairly account for the probabilistic nature of machine learning models.<n>Our work provides a critical tool for improving the transparency and reliability of privacy evaluations.
arXiv Detail & Related papers (2026-01-17T17:51:14Z) - MAGPIE: A dataset for Multi-AGent contextual PrIvacy Evaluation [54.410825977390274]
Existing benchmarks to evaluate contextual privacy in LLM-agents primarily assess single-turn, low-complexity tasks.<n>We first present a benchmark - MAGPIE comprising 158 real-life high-stakes scenarios across 15 domains.<n>We then evaluate the current state-of-the-art LLMs on their understanding of contextually private data and their ability to collaborate without violating user privacy.
arXiv Detail & Related papers (2025-06-25T18:04:25Z) - An applied Perspective: Estimating the Differential Identifiability Risk of an Exemplary SOEP Data Set [2.66269503676104]
We show how to compute the risk metric efficiently for a set of basic statistical queries.
Our empirical analysis based on an extensive, real-world scientific data set expands the knowledge on how to compute risks under realistic conditions.
arXiv Detail & Related papers (2024-07-04T17:50:55Z) - Collection, usage and privacy of mobility data in the enterprise and public administrations [55.2480439325792]
Security measures such as anonymization are needed to protect individuals' privacy.
Within our study, we conducted expert interviews to gain insights into practices in the field.
We survey privacy-enhancing methods in use, which generally do not comply with state-of-the-art standards of differential privacy.
arXiv Detail & Related papers (2024-07-04T08:29:27Z) - A Summary of Privacy-Preserving Data Publishing in the Local Setting [0.6749750044497732]
Statistical Disclosure Control aims to minimize the risk of exposing confidential information by de-identifying it.
We outline the current privacy-preserving techniques employed in microdata de-identification, delve into privacy measures tailored for various disclosure scenarios, and assess metrics for information loss and predictive performance.
arXiv Detail & Related papers (2023-12-19T04:23:23Z) - $\alpha$-Mutual Information: A Tunable Privacy Measure for Privacy
Protection in Data Sharing [4.475091558538915]
This paper adopts Arimoto's $alpha$-Mutual Information as a tunable privacy measure.
We formulate a general distortion-based mechanism that manipulates the original data to offer privacy protection.
arXiv Detail & Related papers (2023-10-27T16:26:14Z) - Summary Statistic Privacy in Data Sharing [23.50797952699759]
We study a setting where a data holder wishes to share data with a receiver, without revealing certain summary statistics of the data distribution.
We propose summary statistic privacy, a metric for quantifying the privacy risk of such a mechanism.
We show that the proposed quantization mechanisms achieve better privacy-distortion tradeoffs than alternative privacy mechanisms.
arXiv Detail & Related papers (2023-03-03T15:29:19Z) - Breaking the Communication-Privacy-Accuracy Tradeoff with
$f$-Differential Privacy [51.11280118806893]
We consider a federated data analytics problem in which a server coordinates the collaborative data analysis of multiple users with privacy concerns and limited communication capability.
We study the local differential privacy guarantees of discrete-valued mechanisms with finite output space through the lens of $f$-differential privacy (DP)
More specifically, we advance the existing literature by deriving tight $f$-DP guarantees for a variety of discrete-valued mechanisms.
arXiv Detail & Related papers (2023-02-19T16:58:53Z) - No Free Lunch in "Privacy for Free: How does Dataset Condensation Help
Privacy" [75.98836424725437]
New methods designed to preserve data privacy require careful scrutiny.
Failure to preserve privacy is hard to detect, and yet can lead to catastrophic results when a system implementing a privacy-preserving'' method is attacked.
arXiv Detail & Related papers (2022-09-29T17:50:23Z) - Releasing survey microdata with exact cluster locations and additional
privacy safeguards [77.34726150561087]
We propose an alternative microdata dissemination strategy that leverages the utility of the original microdata with additional privacy safeguards.
Our strategy reduces the respondents' re-identification risk for any number of disclosed attributes by 60-80% even under re-identification attempts.
arXiv Detail & Related papers (2022-05-24T19:37:11Z) - On the Privacy-Utility Tradeoff in Peer-Review Data Analysis [34.0435377376779]
A major impediment to research on improving peer review is the unavailability of peer-review data.
We propose a framework for privacy-preserving release of certain conference peer-review data.
arXiv Detail & Related papers (2020-06-29T21:08:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.