Noisy Measurements Are Important, the Design of Census Products Is Much More Important
- URL: http://arxiv.org/abs/2312.14191v2
- Date: Wed, 1 May 2024 15:55:28 GMT
- Title: Noisy Measurements Are Important, the Design of Census Products Is Much More Important
- Authors: John M. Abowd,
- Abstract summary: McCartan et al. (2023) call for "making differential privacy work for census data users"
This commentary explains why the 2020 Census Noisy Measurement Files (NMFs) are not the best focus for that plea.
- Score: 1.52292571922932
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: McCartan et al. (2023) call for "making differential privacy work for census data users." This commentary explains why the 2020 Census Noisy Measurement Files (NMFs) are not the best focus for that plea. The August 2021 letter from 62 prominent researchers asking for production of the direct output of the differential privacy system deployed for the 2020 Census signaled the engagement of the scholarly community in the design of decennial census data products. NMFs, the raw statistics produced by the 2020 Census Disclosure Avoidance System before any post-processing, are one component of that design-the query strategy output. The more important component is the query workload output-the statistics released to the public. Optimizing the query workload-the Redistricting Data (P.L. 94-171) Summary File, specifically-could allow the privacy-loss budget to be more effectively managed. There could be fewer noisy measurements, no post-processing bias, and direct estimates of the uncertainty from disclosure avoidance for each published statistic.
Related papers
- The 2020 United States Decennial Census Is More Private Than You (Might) Think [25.32778927275117]
We show that between 8.50% and 13.76% of the privacy budget for the 2020 U.S. Census remains unused for each of the eight geographical levels.
We mitigate noise variances by 15.08% to 24.82% while maintaining the same privacy budget for each geographical level.
arXiv Detail & Related papers (2024-10-11T23:06:15Z) - Differentially Private Data Release on Graphs: Inefficiencies and Unfairness [48.96399034594329]
This paper characterizes the impact of Differential Privacy on bias and unfairness in the context of releasing information about networks.
We consider a network release problem where the network structure is known to all, but the weights on edges must be released privately.
Our work provides theoretical foundations and empirical evidence into the bias and unfairness arising due to privacy in these networked decision problems.
arXiv Detail & Related papers (2024-08-08T08:37:37Z) - Benchmarking Private Population Data Release Mechanisms: Synthetic Data vs. TopDown [50.40020716418472]
This study conducts a comparison between the TopDown algorithm and private synthetic data generation to determine how accuracy is affected by query complexity.
Our results show that for in-distribution queries, the TopDown algorithm achieves significantly better privacy-fidelity tradeoffs than any of the synthetic data methods we evaluated.
arXiv Detail & Related papers (2024-01-31T17:38:34Z) - Disclosure Avoidance for the 2020 Census Demographic and Housing Characteristics File [7.664548801662584]
We describe the concepts and methods used by the Disclosure Avoidance System (DAS) to produce formally private output in support of the 2020 Census data product releases.
We describe the updates to the DAS that were required to release the Demographic and Housing Characteristics (DHC) File.
We also describe subsequent experimental data products to facilitate development of tools that provide confidence intervals for confidential 2020 Census tabulations.
arXiv Detail & Related papers (2023-12-18T00:54:04Z) - Evaluating Bias and Noise Induced by the U.S. Census Bureau's Privacy
Protection Methods [0.0]
The U.S. Census Bureau faces a difficult trade-off between the accuracy of Census statistics and the protection of individual information.
We conduct the first independent evaluation of bias and noise induced by the Bureau's two main disclosure avoidance systems.
TopDown's post-processing dramatically reduces the NMF noise and produces data whose accuracy is similar to that of swapping.
arXiv Detail & Related papers (2023-06-13T03:30:19Z) - Making Differential Privacy Work for Census Data Users [0.0]
The U.S. Census Bureau collects and publishes detailed demographic data about Americans which are heavily used by researchers and policymakers.
A key output of this privacy protection system is the Noisy Measurement File (NMF), which is produced by adding random noise to tabulated statistics.
We describe the process we use to transform the NMF into a usable format, and provide recommendations to the Bureau for how to release future versions of the NMF.
arXiv Detail & Related papers (2023-05-12T02:48:11Z) - No Free Lunch in "Privacy for Free: How does Dataset Condensation Help
Privacy" [75.98836424725437]
New methods designed to preserve data privacy require careful scrutiny.
Failure to preserve privacy is hard to detect, and yet can lead to catastrophic results when a system implementing a privacy-preserving'' method is attacked.
arXiv Detail & Related papers (2022-09-29T17:50:23Z) - Post-processing of Differentially Private Data: A Fairness Perspective [53.29035917495491]
This paper shows that post-processing causes disparate impacts on individuals or groups.
It analyzes two critical settings: the release of differentially private datasets and the use of such private datasets for downstream decisions.
It proposes a novel post-processing mechanism that is (approximately) optimal under different fairness metrics.
arXiv Detail & Related papers (2022-01-24T02:45:03Z) - The Impact of the U.S. Census Disclosure Avoidance System on
Redistricting and Voting Rights Analysis [0.0]
The US Census Bureau plans to protect the privacy of 2020 Census respondents through its Disclosure Avoidance System (DAS)
We find that the protected data are not of sufficient quality for redistricting purposes.
Our analysis finds that the DAS-protected data are biased against certain areas, depending on voter turnout and partisan and racial composition.
arXiv Detail & Related papers (2021-05-29T03:32:36Z) - Bias and Variance of Post-processing in Differential Privacy [53.29035917495491]
Post-processing immunity is a fundamental property of differential privacy.
It is often argued that post-processing may introduce bias and increase variance.
This paper takes a first step towards understanding the properties of post-processing.
arXiv Detail & Related papers (2020-10-09T02:12:54Z) - Differential Privacy of Hierarchical Census Data: An Optimization
Approach [53.29035917495491]
Census Bureaus are interested in releasing aggregate socio-economic data about a large population without revealing sensitive information about any individual.
Recent events have identified some of the privacy challenges faced by these organizations.
This paper presents a novel differential-privacy mechanism for releasing hierarchical counts of individuals.
arXiv Detail & Related papers (2020-06-28T18:19:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.