SafeTab-H: Disclosure Avoidance for the 2020 Census Detailed Demographic and Housing Characteristics File B (Detailed DHC-B)
- URL: http://arxiv.org/abs/2505.03072v1
- Date: Fri, 02 May 2025 13:15:14 GMT
- Title: SafeTab-H: Disclosure Avoidance for the 2020 Census Detailed Demographic and Housing Characteristics File B (Detailed DHC-B)
- Authors: William Sexton, Skye Berghel, Bayard Carlson, Sam Haney, Luke Hartman, Michael Hay, Ashwin Machanavajjhala, Gerome Miklau, Amritha Pai, Simran Rajpal, David Pujol, Ruchit Shrestha, Daniel Simmons-Marengo,
- Abstract summary: We describe SafeTab-H, a disclosure avoidance algorithm applied to the release of the U.S. Census Bureau's Detailed Demographic and Housing Characteristics File B.<n>We show that the algorithm satisfies a well-studied variant of differential privacy, called zero-concentrated differential privacy.
- Score: 7.7544849165583525
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This article describes SafeTab-H, a disclosure avoidance algorithm applied to the release of the U.S. Census Bureau's Detailed Demographic and Housing Characteristics File B (Detailed DHC-B) as part of the 2020 Census. The tabulations contain household statistics about household type and tenure iterated by the householder's detailed race, ethnicity, or American Indian and Alaska Native tribe and village at varying levels of geography. We describe the algorithmic strategy which is based on adding noise from a discrete Gaussian distribution and show that the algorithm satisfies a well-studied variant of differential privacy, called zero-concentrated differential privacy. We discuss how the implementation of the SafeTab-H codebase relies on the Tumult Analytics privacy library. We also describe the theoretical expected error properties of the algorithm and explore various aspects of its parameter tuning.
Related papers
- Benchmarking Fraud Detectors on Private Graph Data [70.4654745317714]
Currently, many types of fraud are managed in part by automated detection algorithms that operate over graphs.<n>We consider the scenario where a data holder wishes to outsource development of fraud detectors to third parties.<n>Third parties submit their fraud detectors to the data holder, who evaluates these algorithms on a private dataset and then publicly communicates the results.<n>We propose a realistic privacy attack on this system that allows an adversary to de-anonymize individuals' data based only on the evaluation results.
arXiv Detail & Related papers (2025-07-30T03:20:15Z) - PHSafe: Disclosure Avoidance for the 2020 Census Supplemental Demographic and Housing Characteristics File (S-DHC) [7.7544849165583525]
The article describes the PHSafe algorithm, which is based on adding noise drawn from a discrete Gaussian distribution to the statistics of interest.<n>We prove that the algorithm satisfies a well-studied variant of differential privacy, called zero-concentrated differential privacy.
arXiv Detail & Related papers (2025-05-02T13:20:32Z) - SafeTab-P: Disclosure Avoidance for the 2020 Census Detailed Demographic and Housing Characteristics File A (Detailed DHC-A) [7.787555954397617]
The article describes the disclosure avoidance algorithm that the U.S. Census Bureau used to protect the Detailed Demographic and Housing Characteristics File A (DHC-A) of the 2020 Census.<n>The SafeTab-P algorithm is based on adding noise drawn to statistics of interest from a discrete Gaussian distribution.<n>We prove that the algorithm satisfies a well-studied variant of differential privacy, called zero-concentrated differential privacy (zCDP)
arXiv Detail & Related papers (2025-05-02T13:08:28Z) - The 2020 United States Decennial Census Is More Private Than You (Might) Think [25.32778927275117]
We show that the 2020 U.S. Census provides significantly stronger privacy protections than its nominal guarantees suggest.<n>We show that noise variances could be reduced by $15.08%$ to $24.82%$ while maintaining nearly the same level of privacy protection for each geographical level.
arXiv Detail & Related papers (2024-10-11T23:06:15Z) - Full-Information Estimation For Hierarchical Data [0.43512163406552007]
The U.S. Census Bureau's 2020 Disclosure Avoidance System (DAS) bases its output on noisy measurements.
These noisy measurements are observed in a set of hierarchical geographic units, e.g., the U.S. as a whole, states, counties, census tracts, and census blocks.
This paper describes a method to leverage the hierarchical structure within these noisy measurements to compute confidence intervals for arbitrary tabulations.
arXiv Detail & Related papers (2024-04-19T20:18:16Z) - Disclosure Avoidance for the 2020 Census Demographic and Housing Characteristics File [7.664548801662584]
We describe the concepts and methods used by the Disclosure Avoidance System (DAS) to produce formally private output in support of the 2020 Census data product releases.
We describe the updates to the DAS that were required to release the Demographic and Housing Characteristics (DHC) File.
We also describe subsequent experimental data products to facilitate development of tools that provide confidence intervals for confidential 2020 Census tabulations.
arXiv Detail & Related papers (2023-12-18T00:54:04Z) - Differentially Private Stochastic Gradient Descent with Low-Noise [49.981789906200035]
Modern machine learning algorithms aim to extract fine-grained information from data to provide accurate predictions, which often conflicts with the goal of privacy protection.
This paper addresses the practical and theoretical importance of developing privacy-preserving machine learning algorithms that ensure good performance while preserving privacy.
arXiv Detail & Related papers (2022-09-09T08:54:13Z) - Individual Privacy Accounting for Differentially Private Stochastic Gradient Descent [69.14164921515949]
We characterize privacy guarantees for individual examples when releasing models trained by DP-SGD.
We find that most examples enjoy stronger privacy guarantees than the worst-case bound.
This implies groups that are underserved in terms of model utility simultaneously experience weaker privacy guarantees.
arXiv Detail & Related papers (2022-06-06T13:49:37Z) - The Impact of the U.S. Census Disclosure Avoidance System on
Redistricting and Voting Rights Analysis [0.0]
The US Census Bureau plans to protect the privacy of 2020 Census respondents through its Disclosure Avoidance System (DAS)
We find that the protected data are not of sufficient quality for redistricting purposes.
Our analysis finds that the DAS-protected data are biased against certain areas, depending on voter turnout and partisan and racial composition.
arXiv Detail & Related papers (2021-05-29T03:32:36Z) - Do Not Let Privacy Overbill Utility: Gradient Embedding Perturbation for
Private Learning [74.73901662374921]
A differentially private model degrades the utility drastically when the model comprises a large number of trainable parameters.
We propose an algorithm emphGradient Embedding Perturbation (GEP) towards training differentially private deep models with decent accuracy.
arXiv Detail & Related papers (2021-02-25T04:29:58Z) - Magnify Your Population: Statistical Downscaling to Augment the Spatial
Resolution of Socioeconomic Census Data [48.7576911714538]
We present a new statistical downscaling approach to derive fine-scale estimates of key socioeconomic attributes.
For each selected socioeconomic variable, a Random Forest model is trained on the source Census units and then used to generate fine-scale gridded predictions.
As a case study, we apply this method to Census data in the United States, downscaling the selected socioeconomic variables available at the block group level, to a grid of 300 spatial resolution.
arXiv Detail & Related papers (2020-06-23T16:52:18Z) - A One-Pass Private Sketch for Most Machine Learning Tasks [48.17461258268463]
Differential privacy (DP) is a compelling privacy definition that explains the privacy-utility tradeoff via formal, provable guarantees.
We propose a private sketch that supports a multitude of machine learning tasks including regression, classification, density estimation, and more.
Our sketch consists of randomized contingency tables that are indexed with locality-sensitive hashing and constructed with an efficient one-pass algorithm.
arXiv Detail & Related papers (2020-06-16T17:47:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.