Related papers: SafeTab-P: Disclosure Avoidance for the 2020 Census Detailed Demographic and Housing Characteristics File A (Detailed DHC-A)

SafeTab-P: Disclosure Avoidance for the 2020 Census Detailed Demographic and Housing Characteristics File A (Detailed DHC-A)

URL: http://arxiv.org/abs/2505.01472v1
Date: Fri, 02 May 2025 13:08:28 GMT
Title: SafeTab-P: Disclosure Avoidance for the 2020 Census Detailed Demographic and Housing Characteristics File A (Detailed DHC-A)
Authors: Sam Haney, Skye Berghel, Bayard Carlson, Ryan Cumings-Menon, Luke Hartman, Michael Hay, Ashwin Machanavajjhala, Gerome Miklau, Amritha Pai, Simran Rajpal, David Pujol, William Sexton, Ruchit Shrestha, Daniel Simmons-Marengo,
Abstract summary: The article describes the disclosure avoidance algorithm that the U.S. Census Bureau used to protect the Detailed Demographic and Housing Characteristics File A (DHC-A) of the 2020 Census.<n>The SafeTab-P algorithm is based on adding noise drawn to statistics of interest from a discrete Gaussian distribution.<n>We prove that the algorithm satisfies a well-studied variant of differential privacy, called zero-concentrated differential privacy (zCDP)
Score: 7.787555954397617
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This article describes the disclosure avoidance algorithm that the U.S. Census Bureau used to protect the Detailed Demographic and Housing Characteristics File A (Detailed DHC-A) of the 2020 Census. The tabulations contain statistics (counts) of demographic characteristics of the entire population of the United States, crossed with detailed races and ethnicities at varying levels of geography. The article describes the SafeTab-P algorithm, which is based on adding noise drawn to statistics of interest from a discrete Gaussian distribution. A key innovation in SafeTab-P is the ability to adaptively choose how many statistics and at what granularity to release them, depending on the size of a population group. We prove that the algorithm satisfies a well-studied variant of differential privacy, called zero-concentrated differential privacy (zCDP). We then describe how the algorithm was implemented on Tumult Analytics and briefly outline the parameterization and tuning of the algorithm.

Related papers

Benchmarking Fraud Detectors on Private Graph Data [70.4654745317714]
Currently, many types of fraud are managed in part by automated detection algorithms that operate over graphs.<n>We consider the scenario where a data holder wishes to outsource development of fraud detectors to third parties.<n>Third parties submit their fraud detectors to the data holder, who evaluates these algorithms on a private dataset and then publicly communicates the results.<n>We propose a realistic privacy attack on this system that allows an adversary to de-anonymize individuals' data based only on the evaluation results.
arXiv Detail & Related papers (2025-07-30T03:20:15Z)
PHSafe: Disclosure Avoidance for the 2020 Census Supplemental Demographic and Housing Characteristics File (S-DHC) [7.7544849165583525]
The article describes the PHSafe algorithm, which is based on adding noise drawn from a discrete Gaussian distribution to the statistics of interest.<n>We prove that the algorithm satisfies a well-studied variant of differential privacy, called zero-concentrated differential privacy.
arXiv Detail & Related papers (2025-05-02T13:20:32Z)
SafeTab-H: Disclosure Avoidance for the 2020 Census Detailed Demographic and Housing Characteristics File B (Detailed DHC-B) [7.7544849165583525]
We describe SafeTab-H, a disclosure avoidance algorithm applied to the release of the U.S. Census Bureau's Detailed Demographic and Housing Characteristics File B.<n>We show that the algorithm satisfies a well-studied variant of differential privacy, called zero-concentrated differential privacy.
arXiv Detail & Related papers (2025-05-02T13:15:14Z)
Full-Information Estimation For Hierarchical Data [0.43512163406552007]
The U.S. Census Bureau's 2020 Disclosure Avoidance System (DAS) bases its output on noisy measurements. These noisy measurements are observed in a set of hierarchical geographic units, e.g., the U.S. as a whole, states, counties, census tracts, and census blocks. This paper describes a method to leverage the hierarchical structure within these noisy measurements to compute confidence intervals for arbitrary tabulations.
arXiv Detail & Related papers (2024-04-19T20:18:16Z)
Evaluating Bias and Noise Induced by the U.S. Census Bureau's Privacy Protection Methods [0.0]
The U.S. Census Bureau faces a difficult trade-off between the accuracy of Census statistics and the protection of individual information. We conduct the first independent evaluation of bias and noise induced by the Bureau's two main disclosure avoidance systems. TopDown's post-processing dramatically reduces the NMF noise and produces data whose accuracy is similar to that of swapping.
arXiv Detail & Related papers (2023-06-13T03:30:19Z)
Retiring $\Delta$DP: New Distribution-Level Metrics for Demographic Parity [47.78843764957511]
The fairness metric $Delta DP$ can not precisely measure the violation of demographic parity. We propose two new fairness metrics, Area Between Probability density function Curves (ABPC) and Area Between Cumulative density function Curves (ABCC) Our proposed new metrics enjoy: i) zero-value ABCC/ABPC guarantees zero violation of demographic parity; ii) ABCC/ABPC guarantees demographic parity while the classification thresholds are adjusted.
arXiv Detail & Related papers (2023-01-31T06:43:55Z)
The Impact of the U.S. Census Disclosure Avoidance System on Redistricting and Voting Rights Analysis [0.0]
The US Census Bureau plans to protect the privacy of 2020 Census respondents through its Disclosure Avoidance System (DAS) We find that the protected data are not of sufficient quality for redistricting purposes. Our analysis finds that the DAS-protected data are biased against certain areas, depending on voter turnout and partisan and racial composition.
arXiv Detail & Related papers (2021-05-29T03:32:36Z)
Balancing Biases and Preserving Privacy on Balanced Faces in the Wild [50.915684171879036]
There are demographic biases present in current facial recognition (FR) models. We introduce our Balanced Faces in the Wild dataset to measure these biases across different ethnic and gender subgroups. We find that relying on a single score threshold to differentiate between genuine and imposters sample pairs leads to suboptimal results. We propose a novel domain adaptation learning scheme that uses facial features extracted from state-of-the-art neural networks.
arXiv Detail & Related papers (2021-03-16T15:05:49Z)
Differential Privacy of Hierarchical Census Data: An Optimization Approach [53.29035917495491]
Census Bureaus are interested in releasing aggregate socio-economic data about a large population without revealing sensitive information about any individual. Recent events have identified some of the privacy challenges faced by these organizations. This paper presents a novel differential-privacy mechanism for releasing hierarchical counts of individuals.
arXiv Detail & Related papers (2020-06-28T18:19:55Z)
Magnify Your Population: Statistical Downscaling to Augment the Spatial Resolution of Socioeconomic Census Data [48.7576911714538]
We present a new statistical downscaling approach to derive fine-scale estimates of key socioeconomic attributes. For each selected socioeconomic variable, a Random Forest model is trained on the source Census units and then used to generate fine-scale gridded predictions. As a case study, we apply this method to Census data in the United States, downscaling the selected socioeconomic variables available at the block group level, to a grid of 300 spatial resolution.
arXiv Detail & Related papers (2020-06-23T16:52:18Z)
A One-Pass Private Sketch for Most Machine Learning Tasks [48.17461258268463]
Differential privacy (DP) is a compelling privacy definition that explains the privacy-utility tradeoff via formal, provable guarantees. We propose a private sketch that supports a multitude of machine learning tasks including regression, classification, density estimation, and more. Our sketch consists of randomized contingency tables that are indexed with locality-sensitive hashing and constructed with an efficient one-pass algorithm.
arXiv Detail & Related papers (2020-06-16T17:47:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.