PHSafe: Disclosure Avoidance for the 2020 Census Supplemental Demographic and Housing Characteristics File (S-DHC)
- URL: http://arxiv.org/abs/2505.01254v1
- Date: Fri, 02 May 2025 13:20:32 GMT
- Title: PHSafe: Disclosure Avoidance for the 2020 Census Supplemental Demographic and Housing Characteristics File (S-DHC)
- Authors: William Sexton, Skye Berghel, Bayard Carlson, Sam Haney, Luke Hartman, Michael Hay, Ashwin Machanavajjhala, Gerome Miklau, Amritha Pai, Simran Rajpal, David Pujol, Ruchit Shrestha, Daniel Simmons-Marengo,
- Abstract summary: The article describes the PHSafe algorithm, which is based on adding noise drawn from a discrete Gaussian distribution to the statistics of interest.<n>We prove that the algorithm satisfies a well-studied variant of differential privacy, called zero-concentrated differential privacy.
- Score: 7.7544849165583525
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This article describes the disclosure avoidance algorithm that the U.S. Census Bureau used to protect the 2020 Census Supplemental Demographic and Housing Characteristics File (S-DHC). The tabulations contain statistics of counts of U.S. persons living in certain types of households, including averages. The article describes the PHSafe algorithm, which is based on adding noise drawn from a discrete Gaussian distribution to the statistics of interest. We prove that the algorithm satisfies a well-studied variant of differential privacy, called zero-concentrated differential privacy. We then describe how the algorithm was implemented on Tumult Analytics and briefly outline the parameterization and tuning of the algorithm.
Related papers
- Benchmarking Fraud Detectors on Private Graph Data [70.4654745317714]
Currently, many types of fraud are managed in part by automated detection algorithms that operate over graphs.<n>We consider the scenario where a data holder wishes to outsource development of fraud detectors to third parties.<n>Third parties submit their fraud detectors to the data holder, who evaluates these algorithms on a private dataset and then publicly communicates the results.<n>We propose a realistic privacy attack on this system that allows an adversary to de-anonymize individuals' data based only on the evaluation results.
arXiv Detail & Related papers (2025-07-30T03:20:15Z) - SafeTab-H: Disclosure Avoidance for the 2020 Census Detailed Demographic and Housing Characteristics File B (Detailed DHC-B) [7.7544849165583525]
We describe SafeTab-H, a disclosure avoidance algorithm applied to the release of the U.S. Census Bureau's Detailed Demographic and Housing Characteristics File B.<n>We show that the algorithm satisfies a well-studied variant of differential privacy, called zero-concentrated differential privacy.
arXiv Detail & Related papers (2025-05-02T13:15:14Z) - SafeTab-P: Disclosure Avoidance for the 2020 Census Detailed Demographic and Housing Characteristics File A (Detailed DHC-A) [7.787555954397617]
The article describes the disclosure avoidance algorithm that the U.S. Census Bureau used to protect the Detailed Demographic and Housing Characteristics File A (DHC-A) of the 2020 Census.<n>The SafeTab-P algorithm is based on adding noise drawn to statistics of interest from a discrete Gaussian distribution.<n>We prove that the algorithm satisfies a well-studied variant of differential privacy, called zero-concentrated differential privacy (zCDP)
arXiv Detail & Related papers (2025-05-02T13:08:28Z) - Linear-Time User-Level DP-SCO via Robust Statistics [55.350093142673316]
User-level differentially private convex optimization (DP-SCO) has garnered significant attention due to the importance of safeguarding user privacy in machine learning applications.<n>Current methods, such as those based on differentially private gradient descent (DP-SGD), often struggle with high noise accumulation and suboptimal utility.<n>We introduce a novel linear-time algorithm that leverages robust statistics, specifically the median and trimmed mean, to overcome these challenges.
arXiv Detail & Related papers (2025-02-13T02:05:45Z) - Evaluating the Impacts of Swapping on the US Decennial Census [7.020785266789317]
We describe and implement a parameterized swapping algorithm based on Census publications, court documents, and informal interviews with Census employees.<n>We provide intuition for the types of shifts induced by swapping and compare against those introduced by TopDown.
arXiv Detail & Related papers (2025-02-03T12:51:16Z) - Data value estimation on private gradients [84.966853523107]
For gradient-based machine learning (ML) methods, the de facto differential privacy technique is perturbing the gradients with random noise.<n>Data valuation attributes the ML performance to the training data and is widely used in privacy-aware applications that require enforcing DP.<n>We show that the answer is no with the default approach of injecting i.i.d.random noise to the gradients because the estimation uncertainty of the data value estimation paradoxically linearly scales with more estimation budget.<n>We propose to instead inject carefully correlated noise to provably remove the linear scaling of estimation uncertainty w.r.t.the budget.
arXiv Detail & Related papers (2024-12-22T13:15:51Z) - The 2020 United States Decennial Census Is More Private Than You (Might) Think [25.32778927275117]
We show that the 2020 U.S. Census provides significantly stronger privacy protections than its nominal guarantees suggest.<n>We show that noise variances could be reduced by $15.08%$ to $24.82%$ while maintaining nearly the same level of privacy protection for each geographical level.
arXiv Detail & Related papers (2024-10-11T23:06:15Z) - Synthetic Census Data Generation via Multidimensional Multiset Sum [7.900694093691988]
We provide tools to generate synthetic microdata solely from published Census statistics.
We show that our methods work well in practice, and we offer theoretical arguments to explain our performance.
arXiv Detail & Related papers (2024-04-15T19:06:37Z) - Comment: The Essential Role of Policy Evaluation for the 2020 Census
Disclosure Avoidance System [0.0]
boyd and Sarathy, "Differential Perspectives: Epistemic Disconnects Surrounding the US Census Bureau's Use of Differential Privacy"
We argue that empirical evaluations of the Census Disclosure Avoidance System failed to recognize how the benchmark data is never a ground truth of population counts.
We argue that policy makers must confront a key trade-off between data utility and privacy protection.
arXiv Detail & Related papers (2022-10-15T21:41:54Z) - Differentially Private Stochastic Gradient Descent with Low-Noise [49.981789906200035]
Modern machine learning algorithms aim to extract fine-grained information from data to provide accurate predictions, which often conflicts with the goal of privacy protection.
This paper addresses the practical and theoretical importance of developing privacy-preserving machine learning algorithms that ensure good performance while preserving privacy.
arXiv Detail & Related papers (2022-09-09T08:54:13Z) - Partial sensitivity analysis in differential privacy [58.730520380312676]
We investigate the impact of each input feature on the individual's privacy loss.
We experimentally evaluate our approach on queries over private databases.
We also explore our findings in the context of neural network training on synthetic data.
arXiv Detail & Related papers (2021-09-22T08:29:16Z) - Smoothed Differential Privacy [55.415581832037084]
Differential privacy (DP) is a widely-accepted and widely-applied notion of privacy based on worst-case analysis.
In this paper, we propose a natural extension of DP following the worst average-case idea behind the celebrated smoothed analysis.
We prove that any discrete mechanism with sampling procedures is more private than what DP predicts, while many continuous mechanisms with sampling procedures are still non-private under smoothed DP.
arXiv Detail & Related papers (2021-07-04T06:55:45Z) - The Impact of the U.S. Census Disclosure Avoidance System on
Redistricting and Voting Rights Analysis [0.0]
The US Census Bureau plans to protect the privacy of 2020 Census respondents through its Disclosure Avoidance System (DAS)
We find that the protected data are not of sufficient quality for redistricting purposes.
Our analysis finds that the DAS-protected data are biased against certain areas, depending on voter turnout and partisan and racial composition.
arXiv Detail & Related papers (2021-05-29T03:32:36Z) - Magnify Your Population: Statistical Downscaling to Augment the Spatial
Resolution of Socioeconomic Census Data [48.7576911714538]
We present a new statistical downscaling approach to derive fine-scale estimates of key socioeconomic attributes.
For each selected socioeconomic variable, a Random Forest model is trained on the source Census units and then used to generate fine-scale gridded predictions.
As a case study, we apply this method to Census data in the United States, downscaling the selected socioeconomic variables available at the block group level, to a grid of 300 spatial resolution.
arXiv Detail & Related papers (2020-06-23T16:52:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.