Disclosure Avoidance for the 2020 Census Demographic and Housing Characteristics File
- URL: http://arxiv.org/abs/2312.10863v1
- Date: Mon, 18 Dec 2023 00:54:04 GMT
- Title: Disclosure Avoidance for the 2020 Census Demographic and Housing Characteristics File
- Authors: Ryan Cumings-Menon, Robert Ashmead, Daniel Kifer, Philip Leclerc, Matthew Spence, Pavel Zhuravlev, John M. Abowd,
- Abstract summary: We describe the concepts and methods used by the Disclosure Avoidance System (DAS) to produce formally private output in support of the 2020 Census data product releases.
We describe the updates to the DAS that were required to release the Demographic and Housing Characteristics (DHC) File.
We also describe subsequent experimental data products to facilitate development of tools that provide confidence intervals for confidential 2020 Census tabulations.
- Score: 7.664548801662584
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In "The 2020 Census Disclosure Avoidance System TopDown Algorithm," Abowd et al. (2022) describe the concepts and methods used by the Disclosure Avoidance System (DAS) to produce formally private output in support of the 2020 Census data product releases, with a particular focus on the DAS implementation that was used to create the 2020 Census Redistricting Data (P.L. 94-171) Summary File. In this paper we describe the updates to the DAS that were required to release the Demographic and Housing Characteristics (DHC) File, which provides more granular tables than other data products, such as the Redistricting Data Summary File. We also describe the final configuration parameters used for the production DHC DAS implementation, as well as subsequent experimental data products to facilitate development of tools that provide confidence intervals for confidential 2020 Census tabulations.
Related papers
- Formal Privacy Guarantees with Invariant Statistics [8.133739801185271]
Motivated by the 2020 US Census products, this paper extends differential privacy (DP) to address the joint release of DP outputs and nonprivate statistics.
Our framework, Semi-DP, redefines adjacency by focusing on datasets that conform to the given invariant.
We provide a privacy analysis of the 2020 US Decennial Census using the Semi-DP framework, revealing that the effective privacy guarantees are weaker than advertised.
arXiv Detail & Related papers (2024-10-22T22:50:17Z) - Full-Information Estimation For Hierarchical Data [0.43512163406552007]
The U.S. Census Bureau's 2020 Disclosure Avoidance System (DAS) bases its output on noisy measurements.
These noisy measurements are observed in a set of hierarchical geographic units, e.g., the U.S. as a whole, states, counties, census tracts, and census blocks.
This paper describes a method to leverage the hierarchical structure within these noisy measurements to compute confidence intervals for arbitrary tabulations.
arXiv Detail & Related papers (2024-04-19T20:18:16Z) - Noisy Measurements Are Important, the Design of Census Products Is Much More Important [1.52292571922932]
McCartan et al. (2023) call for "making differential privacy work for census data users"
This commentary explains why the 2020 Census Noisy Measurement Files (NMFs) are not the best focus for that plea.
arXiv Detail & Related papers (2023-12-20T15:43:04Z) - A Geometric Notion of Causal Probing [91.14470073637236]
In a language model's representation space, all information about a concept such as verbal number is encoded in a linear subspace.
We give a set of intrinsic criteria which characterize an ideal linear concept subspace.
We find that LEACE returns a one-dimensional subspace containing roughly half of total concept information.
arXiv Detail & Related papers (2023-07-27T17:57:57Z) - Census TopDown: The Impacts of Differential Privacy on Redistricting [0.3746889836344765]
We consider several key applications of Census data in redistricting.
We find reassuring evidence that TopDown will not threaten the ability to produce districts with tolerable population balance.
arXiv Detail & Related papers (2022-03-09T23:28:53Z) - Post-processing of Differentially Private Data: A Fairness Perspective [53.29035917495491]
This paper shows that post-processing causes disparate impacts on individuals or groups.
It analyzes two critical settings: the release of differentially private datasets and the use of such private datasets for downstream decisions.
It proposes a novel post-processing mechanism that is (approximately) optimal under different fairness metrics.
arXiv Detail & Related papers (2022-01-24T02:45:03Z) - The Impact of the U.S. Census Disclosure Avoidance System on
Redistricting and Voting Rights Analysis [0.0]
The US Census Bureau plans to protect the privacy of 2020 Census respondents through its Disclosure Avoidance System (DAS)
We find that the protected data are not of sufficient quality for redistricting purposes.
Our analysis finds that the DAS-protected data are biased against certain areas, depending on voter turnout and partisan and racial composition.
arXiv Detail & Related papers (2021-05-29T03:32:36Z) - BREEDS: Benchmarks for Subpopulation Shift [98.90314444545204]
We develop a methodology for assessing the robustness of models to subpopulation shift.
We leverage the class structure underlying existing datasets to control the data subpopulations that comprise the training and test distributions.
Applying this methodology to the ImageNet dataset, we create a suite of subpopulation shift benchmarks of varying granularity.
arXiv Detail & Related papers (2020-08-11T17:04:47Z) - Differential Privacy of Hierarchical Census Data: An Optimization
Approach [53.29035917495491]
Census Bureaus are interested in releasing aggregate socio-economic data about a large population without revealing sensitive information about any individual.
Recent events have identified some of the privacy challenges faced by these organizations.
This paper presents a novel differential-privacy mechanism for releasing hierarchical counts of individuals.
arXiv Detail & Related papers (2020-06-28T18:19:55Z) - Magnify Your Population: Statistical Downscaling to Augment the Spatial
Resolution of Socioeconomic Census Data [48.7576911714538]
We present a new statistical downscaling approach to derive fine-scale estimates of key socioeconomic attributes.
For each selected socioeconomic variable, a Random Forest model is trained on the source Census units and then used to generate fine-scale gridded predictions.
As a case study, we apply this method to Census data in the United States, downscaling the selected socioeconomic variables available at the block group level, to a grid of 300 spatial resolution.
arXiv Detail & Related papers (2020-06-23T16:52:18Z) - Real-Time Regression with Dividing Local Gaussian Processes [62.01822866877782]
Local Gaussian processes are a novel, computationally efficient modeling approach based on Gaussian process regression.
Due to an iterative, data-driven division of the input space, they achieve a sublinear computational complexity in the total number of training points in practice.
A numerical evaluation on real-world data sets shows their advantages over other state-of-the-art methods in terms of accuracy as well as prediction and update speed.
arXiv Detail & Related papers (2020-06-16T18:43:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.