Evaluating Bias and Noise Induced by the U.S. Census Bureau's Privacy
Protection Methods
- URL: http://arxiv.org/abs/2306.07521v3
- Date: Sat, 10 Feb 2024 23:31:23 GMT
- Title: Evaluating Bias and Noise Induced by the U.S. Census Bureau's Privacy
Protection Methods
- Authors: Christopher T. Kenny, Cory McCartan, Shiro Kuriwaki, Tyler Simko,
Kosuke Imai
- Abstract summary: The U.S. Census Bureau faces a difficult trade-off between the accuracy of Census statistics and the protection of individual information.
We conduct the first independent evaluation of bias and noise induced by the Bureau's two main disclosure avoidance systems.
TopDown's post-processing dramatically reduces the NMF noise and produces data whose accuracy is similar to that of swapping.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The United States Census Bureau faces a difficult trade-off between the
accuracy of Census statistics and the protection of individual information. We
conduct the first independent evaluation of bias and noise induced by the
Bureau's two main disclosure avoidance systems: the TopDown algorithm employed
for the 2020 Census and the swapping algorithm implemented for the three
previous Censuses. Our evaluation leverages the Noisy Measure File (NMF) as
well as two independent runs of the TopDown algorithm applied to the 2010
decennial Census. We find that the NMF contains too much noise to be directly
useful, especially for Hispanic and multiracial populations. TopDown's
post-processing dramatically reduces the NMF noise and produces data whose
accuracy is similar to that of swapping. While the estimated errors for both
TopDown and swapping algorithms are generally no greater than other sources of
Census error, they can be relatively substantial for geographies with small
total populations.
Related papers
- The 2020 United States Decennial Census Is More Private Than You (Might) Think [25.32778927275117]
We show that between 8.50% and 13.76% of the privacy budget for the 2020 U.S. Census remains unused for each of the eight geographical levels.
We mitigate noise variances by 15.08% to 24.82% while maintaining the same privacy budget for each geographical level.
arXiv Detail & Related papers (2024-10-11T23:06:15Z) - Differentially Private Data Release on Graphs: Inefficiencies and Unfairness [48.96399034594329]
This paper characterizes the impact of Differential Privacy on bias and unfairness in the context of releasing information about networks.
We consider a network release problem where the network structure is known to all, but the weights on edges must be released privately.
Our work provides theoretical foundations and empirical evidence into the bias and unfairness arising due to privacy in these networked decision problems.
arXiv Detail & Related papers (2024-08-08T08:37:37Z) - Benchmarking Private Population Data Release Mechanisms: Synthetic Data vs. TopDown [50.40020716418472]
This study conducts a comparison between the TopDown algorithm and private synthetic data generation to determine how accuracy is affected by query complexity.
Our results show that for in-distribution queries, the TopDown algorithm achieves significantly better privacy-fidelity tradeoffs than any of the synthetic data methods we evaluated.
arXiv Detail & Related papers (2024-01-31T17:38:34Z) - Noisy Measurements Are Important, the Design of Census Products Is Much More Important [1.52292571922932]
McCartan et al. (2023) call for "making differential privacy work for census data users"
This commentary explains why the 2020 Census Noisy Measurement Files (NMFs) are not the best focus for that plea.
arXiv Detail & Related papers (2023-12-20T15:43:04Z) - Making Differential Privacy Work for Census Data Users [0.0]
The U.S. Census Bureau collects and publishes detailed demographic data about Americans which are heavily used by researchers and policymakers.
A key output of this privacy protection system is the Noisy Measurement File (NMF), which is produced by adding random noise to tabulated statistics.
We describe the process we use to transform the NMF into a usable format, and provide recommendations to the Bureau for how to release future versions of the NMF.
arXiv Detail & Related papers (2023-05-12T02:48:11Z) - Census TopDown: The Impacts of Differential Privacy on Redistricting [0.3746889836344765]
We consider several key applications of Census data in redistricting.
We find reassuring evidence that TopDown will not threaten the ability to produce districts with tolerable population balance.
arXiv Detail & Related papers (2022-03-09T23:28:53Z) - Smoothed Differential Privacy [55.415581832037084]
Differential privacy (DP) is a widely-accepted and widely-applied notion of privacy based on worst-case analysis.
In this paper, we propose a natural extension of DP following the worst average-case idea behind the celebrated smoothed analysis.
We prove that any discrete mechanism with sampling procedures is more private than what DP predicts, while many continuous mechanisms with sampling procedures are still non-private under smoothed DP.
arXiv Detail & Related papers (2021-07-04T06:55:45Z) - The Impact of the U.S. Census Disclosure Avoidance System on
Redistricting and Voting Rights Analysis [0.0]
The US Census Bureau plans to protect the privacy of 2020 Census respondents through its Disclosure Avoidance System (DAS)
We find that the protected data are not of sufficient quality for redistricting purposes.
Our analysis finds that the DAS-protected data are biased against certain areas, depending on voter turnout and partisan and racial composition.
arXiv Detail & Related papers (2021-05-29T03:32:36Z) - Graph-Homomorphic Perturbations for Private Decentralized Learning [64.26238893241322]
Local exchange of estimates allows inference of data based on private data.
perturbations chosen independently at every agent, resulting in a significant performance loss.
We propose an alternative scheme, which constructs perturbations according to a particular nullspace condition, allowing them to be invisible.
arXiv Detail & Related papers (2020-10-23T10:35:35Z) - Differential Privacy of Hierarchical Census Data: An Optimization
Approach [53.29035917495491]
Census Bureaus are interested in releasing aggregate socio-economic data about a large population without revealing sensitive information about any individual.
Recent events have identified some of the privacy challenges faced by these organizations.
This paper presents a novel differential-privacy mechanism for releasing hierarchical counts of individuals.
arXiv Detail & Related papers (2020-06-28T18:19:55Z) - Magnify Your Population: Statistical Downscaling to Augment the Spatial
Resolution of Socioeconomic Census Data [48.7576911714538]
We present a new statistical downscaling approach to derive fine-scale estimates of key socioeconomic attributes.
For each selected socioeconomic variable, a Random Forest model is trained on the source Census units and then used to generate fine-scale gridded predictions.
As a case study, we apply this method to Census data in the United States, downscaling the selected socioeconomic variables available at the block group level, to a grid of 300 spatial resolution.
arXiv Detail & Related papers (2020-06-23T16:52:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.