The 2010 Census Confidentiality Protections Failed, Here's How and Why
- URL: http://arxiv.org/abs/2312.11283v1
- Date: Mon, 18 Dec 2023 15:23:12 GMT
- Title: The 2010 Census Confidentiality Protections Failed, Here's How and Why
- Authors: John M. Abowd, Tamara Adams, Robert Ashmead, David Darais, Sourya Dey, Simson L. Garfinkel, Nathan Goldschlag, Daniel Kifer, Philip Leclerc, Ethan Lew, Scott Moore, Rolando A. RodrÃguez, Ramy N. Tadros, Lars Vilhuber,
- Abstract summary: We reconstruct five variables (census block, sex, age, race, and ethnicity) in the confidential 2010 Census person records.
Using only published data, an attacker can verify that all records in 70% of all census blocks (97 million people) are perfectly reconstructed.
We show that alternatives to the 2020 Census Disclosure Avoidance System with similar accuracy (enhanced swapping) also fail to protect confidentiality.
- Score: 6.982581904789855
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Using only 34 published tables, we reconstruct five variables (census block, sex, age, race, and ethnicity) in the confidential 2010 Census person records. Using the 38-bin age variable tabulated at the census block level, at most 20.1% of reconstructed records can differ from their confidential source on even a single value for these five variables. Using only published data, an attacker can verify that all records in 70% of all census blocks (97 million people) are perfectly reconstructed. The tabular publications in Summary File 1 thus have prohibited disclosure risk similar to the unreleased confidential microdata. Reidentification studies confirm that an attacker can, within blocks with perfect reconstruction accuracy, correctly infer the actual census response on race and ethnicity for 3.4 million vulnerable population uniques (persons with nonmodal characteristics) with 95% accuracy, the same precision as the confidential data achieve and far greater than statistical baselines. The flaw in the 2010 Census framework was the assumption that aggregation prevented accurate microdata reconstruction, justifying weaker disclosure limitation methods than were applied to 2010 Census public microdata. The framework used for 2020 Census publications defends against attacks that are based on reconstruction, as we also demonstrate here. Finally, we show that alternatives to the 2020 Census Disclosure Avoidance System with similar accuracy (enhanced swapping) also fail to protect confidentiality, and those that partially defend against reconstruction attacks (incomplete suppression implementations) destroy the primary statutory use case: data for redistricting all legislatures in the country in compliance with the 1965 Voting Rights Act.
Related papers
- Pseudo-Probability Unlearning: Towards Efficient and Privacy-Preserving Machine Unlearning [59.29849532966454]
We propose PseudoProbability Unlearning (PPU), a novel method that enables models to forget data to adhere to privacy-preserving manner.
Our method achieves over 20% improvements in forgetting error compared to the state-of-the-art.
arXiv Detail & Related papers (2024-11-04T21:27:06Z) - The 2020 United States Decennial Census Is More Private Than You (Might) Think [25.32778927275117]
We show that the 2020 U.S. Census provides significantly stronger privacy protections than its nominal guarantees suggest.
We show that noise variances could be reduced by $15.08%$ to $24.82%$ while maintaining nearly the same level of privacy protection for each geographical level.
arXiv Detail & Related papers (2024-10-11T23:06:15Z) - Understanding and Mitigating the Impacts of Differentially Private Census Data on State Level Redistricting [4.589972411795548]
Data users were shaken by the adoption of differential privacy in the 2020 DAS.
We consider two redistricting settings in which a data user might be concerned about the impacts of privacy preserving noise.
We observe that an analyst may come to incorrect conclusions if they do not account for noise.
arXiv Detail & Related papers (2024-09-10T18:11:54Z) - Noisy Measurements Are Important, the Design of Census Products Is Much More Important [1.52292571922932]
McCartan et al. (2023) call for "making differential privacy work for census data users"
This commentary explains why the 2020 Census Noisy Measurement Files (NMFs) are not the best focus for that plea.
arXiv Detail & Related papers (2023-12-20T15:43:04Z) - An Examination of the Alleged Privacy Threats of Confidence-Ranked Reconstruction of Census Microdata [3.2156268397508314]
We show that the proposed reconstruction is neither effective as a reconstruction method nor attribute to disclosure as claimed by its authors.
We report empirical results showing the proposed ranking cannot guide reidentification or conducive disclosure attacks.
arXiv Detail & Related papers (2023-11-06T15:04:03Z) - Membership Inference Attacks against Synthetic Data through Overfitting
Detection [84.02632160692995]
We argue for a realistic MIA setting that assumes the attacker has some knowledge of the underlying data distribution.
We propose DOMIAS, a density-based MIA model that aims to infer membership by targeting local overfitting of the generative model.
arXiv Detail & Related papers (2023-02-24T11:27:39Z) - Confidence-Ranked Reconstruction of Census Microdata from Published
Statistics [45.39928315344449]
A reconstruction attack on a private dataset takes as input some publicly accessible information about the dataset.
We show that our attacks can not only reconstruct full rows from the aggregate query statistics $Q(D)Rmm$, but can do so in a way that reliably ranks reconstructed rows by their odds.
Our attacks significantly outperform those that are based only on access to a public distribution or population from which the private dataset $D$ was sampled.
arXiv Detail & Related papers (2022-11-06T14:08:43Z) - No Free Lunch in "Privacy for Free: How does Dataset Condensation Help
Privacy" [75.98836424725437]
New methods designed to preserve data privacy require careful scrutiny.
Failure to preserve privacy is hard to detect, and yet can lead to catastrophic results when a system implementing a privacy-preserving'' method is attacked.
arXiv Detail & Related papers (2022-09-29T17:50:23Z) - Releasing survey microdata with exact cluster locations and additional
privacy safeguards [77.34726150561087]
We propose an alternative microdata dissemination strategy that leverages the utility of the original microdata with additional privacy safeguards.
Our strategy reduces the respondents' re-identification risk for any number of disclosed attributes by 60-80% even under re-identification attempts.
arXiv Detail & Related papers (2022-05-24T19:37:11Z) - The Impact of the U.S. Census Disclosure Avoidance System on
Redistricting and Voting Rights Analysis [0.0]
The US Census Bureau plans to protect the privacy of 2020 Census respondents through its Disclosure Avoidance System (DAS)
We find that the protected data are not of sufficient quality for redistricting purposes.
Our analysis finds that the DAS-protected data are biased against certain areas, depending on voter turnout and partisan and racial composition.
arXiv Detail & Related papers (2021-05-29T03:32:36Z) - Magnify Your Population: Statistical Downscaling to Augment the Spatial
Resolution of Socioeconomic Census Data [48.7576911714538]
We present a new statistical downscaling approach to derive fine-scale estimates of key socioeconomic attributes.
For each selected socioeconomic variable, a Random Forest model is trained on the source Census units and then used to generate fine-scale gridded predictions.
As a case study, we apply this method to Census data in the United States, downscaling the selected socioeconomic variables available at the block group level, to a grid of 300 spatial resolution.
arXiv Detail & Related papers (2020-06-23T16:52:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.