Evaluating the Impacts of Swapping on the US Decennial Census
- URL: http://arxiv.org/abs/2502.01320v2
- Date: Mon, 10 Feb 2025 15:59:49 GMT
- Title: Evaluating the Impacts of Swapping on the US Decennial Census
- Authors: Maria Ballesteros, Cynthia Dwork, Gary King, Conlan Olson, Manish Raghavan,
- Abstract summary: We describe and implement a parameterized swapping algorithm based on Census publications, court documents, and informal interviews with Census employees.
We provide intuition for the types of shifts induced by swapping and compare against those introduced by TopDown.
- Score: 7.020785266789317
- License:
- Abstract: To meet its dual burdens of providing useful statistics and ensuring privacy of individual respondents, the US Census Bureau has for decades introduced some form of "noise" into published statistics. Initially, they used a method known as "swapping" (1990-2010). In 2020, they switched to an algorithm called TopDown that ensures a form of Differential Privacy. While the TopDown algorithm has been made public, no implementation of swapping has been released and many details of the deployed swapping methodology deployed have been kept secret. Further, the Bureau has not published (even a synthetic) "original" dataset and its swapped version. It is therefore difficult to evaluate the effects of swapping, and to compare these effects to those of other privacy technologies. To address these difficulties we describe and implement a parameterized swapping algorithm based on Census publications, court documents, and informal interviews with Census employees. With this implementation, we characterize the impacts of swapping on a range of statistical quantities of interest. We provide intuition for the types of shifts induced by swapping and compare against those introduced by TopDown. We find that even when swapping and TopDown introduce errors of similar magnitude, the direction in which statistics are biased need not be the same across the two techniques. More broadly, our implementation provides researchers with the tools to analyze and potentially correct for the impacts of disclosure avoidance systems on the quantities they study.
Related papers
- Differentially Private Data Release on Graphs: Inefficiencies and Unfairness [48.96399034594329]
This paper characterizes the impact of Differential Privacy on bias and unfairness in the context of releasing information about networks.
We consider a network release problem where the network structure is known to all, but the weights on edges must be released privately.
Our work provides theoretical foundations and empirical evidence into the bias and unfairness arising due to privacy in these networked decision problems.
arXiv Detail & Related papers (2024-08-08T08:37:37Z) - Synthetic Census Data Generation via Multidimensional Multiset Sum [7.900694093691988]
We provide tools to generate synthetic microdata solely from published Census statistics.
We show that our methods work well in practice, and we offer theoretical arguments to explain our performance.
arXiv Detail & Related papers (2024-04-15T19:06:37Z) - Benchmarking Private Population Data Release Mechanisms: Synthetic Data vs. TopDown [50.40020716418472]
This study conducts a comparison between the TopDown algorithm and private synthetic data generation to determine how accuracy is affected by query complexity.
Our results show that for in-distribution queries, the TopDown algorithm achieves significantly better privacy-fidelity tradeoffs than any of the synthetic data methods we evaluated.
arXiv Detail & Related papers (2024-01-31T17:38:34Z) - The Impact of Differential Feature Under-reporting on Algorithmic Fairness [86.275300739926]
We present an analytically tractable model of differential feature under-reporting.
We then use to characterize the impact of this kind of data bias on algorithmic fairness.
Our results show that, in real world data settings, under-reporting typically leads to increasing disparities.
arXiv Detail & Related papers (2024-01-16T19:16:22Z) - Evaluating Bias and Noise Induced by the U.S. Census Bureau's Privacy
Protection Methods [0.0]
The U.S. Census Bureau faces a difficult trade-off between the accuracy of Census statistics and the protection of individual information.
We conduct the first independent evaluation of bias and noise induced by the Bureau's two main disclosure avoidance systems.
TopDown's post-processing dramatically reduces the NMF noise and produces data whose accuracy is similar to that of swapping.
arXiv Detail & Related papers (2023-06-13T03:30:19Z) - Enabling Trade-offs in Privacy and Utility in Genomic Data Beacons and
Summary Statistics [26.99521354120141]
We introduce optimization-based approaches to explicitly trade off the utility of summary data or Beacon responses and privacy.
In the first, an attacker applies a likelihood-ratio test to make membership-inference claims.
In the second, an attacker uses a threshold that accounts for the effect of the data release on the separation in scores between individuals.
arXiv Detail & Related papers (2023-01-11T19:16:13Z) - Post-processing of Differentially Private Data: A Fairness Perspective [53.29035917495491]
This paper shows that post-processing causes disparate impacts on individuals or groups.
It analyzes two critical settings: the release of differentially private datasets and the use of such private datasets for downstream decisions.
It proposes a novel post-processing mechanism that is (approximately) optimal under different fairness metrics.
arXiv Detail & Related papers (2022-01-24T02:45:03Z) - Distribution-Invariant Differential Privacy [4.700764053354502]
We develop a distribution-invariant privatization (DIP) method to reconcile high statistical accuracy and strict differential privacy.
Under the same strictness of privacy protection, DIP achieves superior statistical accuracy in two simulations and on three real-world benchmarks.
arXiv Detail & Related papers (2021-11-08T22:26:50Z) - Decision Making with Differential Privacy under a Fairness Lens [65.16089054531395]
The U.S. Census Bureau releases data sets and statistics about groups of individuals that are used as input to a number of critical decision processes.
To conform to privacy and confidentiality requirements, these agencies are often required to release privacy-preserving versions of the data.
This paper studies the release of differentially private data sets and analyzes their impact on some critical resource allocation tasks under a fairness perspective.
arXiv Detail & Related papers (2021-05-16T21:04:19Z) - Differential Privacy of Hierarchical Census Data: An Optimization
Approach [53.29035917495491]
Census Bureaus are interested in releasing aggregate socio-economic data about a large population without revealing sensitive information about any individual.
Recent events have identified some of the privacy challenges faced by these organizations.
This paper presents a novel differential-privacy mechanism for releasing hierarchical counts of individuals.
arXiv Detail & Related papers (2020-06-28T18:19:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.