Addressing Census data problems in race imputation via fully Bayesian
Improved Surname Geocoding and name supplements
- URL: http://arxiv.org/abs/2205.06129v1
- Date: Thu, 12 May 2022 14:41:45 GMT
- Title: Addressing Census data problems in race imputation via fully Bayesian
Improved Surname Geocoding and name supplements
- Authors: Kosuke Imai and Santiago Olivella and Evan T. R. Rosenman
- Abstract summary: We introduce a fully Bayesian Improved Surname Geocoding (fBISG) methodology that accounts for potential measurement error in Census counts.
We supplement the Census surname data with additional data on last, first, and middle names taken from the voter files of six Southern states where self-reported race is available.
Our empirical validation shows that the fBISG methodology and name supplements significantly improve the accuracy of race imputation across all racial groups, and especially for Asians.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Prediction of an individual's race and ethnicity plays an important role in
social science and public health research. Examples include studies of racial
disparity in health and voting. Recently, Bayesian Improved Surname Geocoding
(BISG), which uses Bayes' rule to combine information from Census surname files
with the geocoding of an individual's residence, has emerged as a leading
methodology for this prediction task. Unfortunately, BISG suffers from two
Census data problems that contribute to unsatisfactory predictive performance
for minorities. First, the decennial Census often contains zero counts for
minority racial groups in the Census blocks where some members of those groups
reside. Second, because the Census surname files only include frequent names,
many surnames -- especially those of minorities -- are missing from the list.
To address the zero counts problem, we introduce a fully Bayesian Improved
Surname Geocoding (fBISG) methodology that accounts for potential measurement
error in Census counts by extending the na\"ive Bayesian inference of the BISG
methodology to full posterior inference. To address the missing surname
problem, we supplement the Census surname data with additional data on last,
first, and middle names taken from the voter files of six Southern states where
self-reported race is available. Our empirical validation shows that the fBISG
methodology and name supplements significantly improve the accuracy of race
imputation across all racial groups, and especially for Asians. The proposed
methodology, together with additional name data, is available via the
open-source software package wru.
Related papers
- Evaluating Bias and Noise Induced by the U.S. Census Bureau's Privacy
Protection Methods [0.0]
The U.S. Census Bureau faces a difficult trade-off between the accuracy of Census statistics and the protection of individual information.
We conduct the first independent evaluation of bias and noise induced by the Bureau's two main disclosure avoidance systems.
TopDown's post-processing dramatically reduces the NMF noise and produces data whose accuracy is similar to that of swapping.
arXiv Detail & Related papers (2023-06-13T03:30:19Z) - Estimating Racial Disparities When Race is Not Observed [3.0931877196387196]
We introduce a new class of models that produce racial disparity estimates by using surnames as an instrumental variable for race.
A validation study based on the North Carolina voter file shows that BISG+BIRDiE reduces error by up to 84% when estimating racial differences in party registration.
We apply the proposed methodology to estimate racial differences in who benefits from the home mortgage interest deduction using individual-level tax data from the U.S. Internal Revenue Service.
arXiv Detail & Related papers (2023-03-05T04:46:16Z) - Rethinking Data Heterogeneity in Federated Learning: Introducing a New
Notion and Standard Benchmarks [65.34113135080105]
We show that not only the issue of data heterogeneity in current setups is not necessarily a problem but also in fact it can be beneficial for the FL participants.
Our observations are intuitive.
Our code is available at https://github.com/MMorafah/FL-SC-NIID.
arXiv Detail & Related papers (2022-09-30T17:15:19Z) - Race and ethnicity data for first, middle, and last names [0.0]
We provide the largest compiled publicly available dictionaries of first, middle, and last names for imputing race and ethnicity.
The dictionaries are based on the voter files of six Southern states that collect self-reported racial data upon voter registration.
arXiv Detail & Related papers (2022-08-26T05:27:50Z) - Avoiding bias when inferring race using name-based approaches [0.8543368663496084]
We use information from the U.S. Census and mortgage applications to infer the race of U.S. affiliated authors in the Web of Science.
Our results demonstrate that the validity of name based inference varies by race/ethnicity and that threshold approaches underestimate Black authors and overestimate White authors.
arXiv Detail & Related papers (2021-04-14T08:36:22Z) - Balancing Biases and Preserving Privacy on Balanced Faces in the Wild [50.915684171879036]
There are demographic biases present in current facial recognition (FR) models.
We introduce our Balanced Faces in the Wild dataset to measure these biases across different ethnic and gender subgroups.
We find that relying on a single score threshold to differentiate between genuine and imposters sample pairs leads to suboptimal results.
We propose a novel domain adaptation learning scheme that uses facial features extracted from state-of-the-art neural networks.
arXiv Detail & Related papers (2021-03-16T15:05:49Z) - Improving Semi-supervised Federated Learning by Reducing the Gradient
Diversity of Models [67.66144604972052]
Federated learning (FL) is a promising way to use the computing power of mobile devices while maintaining privacy of users.
We show that a critical issue that affects the test accuracy is the large gradient diversity of the models from different users.
We propose a novel grouping-based model averaging method to replace the FedAvg averaging method.
arXiv Detail & Related papers (2020-08-26T03:36:07Z) - Differential Privacy of Hierarchical Census Data: An Optimization
Approach [53.29035917495491]
Census Bureaus are interested in releasing aggregate socio-economic data about a large population without revealing sensitive information about any individual.
Recent events have identified some of the privacy challenges faced by these organizations.
This paper presents a novel differential-privacy mechanism for releasing hierarchical counts of individuals.
arXiv Detail & Related papers (2020-06-28T18:19:55Z) - Magnify Your Population: Statistical Downscaling to Augment the Spatial
Resolution of Socioeconomic Census Data [48.7576911714538]
We present a new statistical downscaling approach to derive fine-scale estimates of key socioeconomic attributes.
For each selected socioeconomic variable, a Random Forest model is trained on the source Census units and then used to generate fine-scale gridded predictions.
As a case study, we apply this method to Census data in the United States, downscaling the selected socioeconomic variables available at the block group level, to a grid of 300 spatial resolution.
arXiv Detail & Related papers (2020-06-23T16:52:18Z) - CNN-based Density Estimation and Crowd Counting: A Survey [65.06491415951193]
This paper comprehensively studies the crowd counting models, mainly CNN-based density map estimation methods.
According to the evaluation metrics, we select the top three performers on their crowd counting datasets.
We expect to make reasonable inference and prediction for the future development of crowd counting.
arXiv Detail & Related papers (2020-03-28T13:17:30Z) - Predicting Race and Ethnicity From the Sequence of Characters in a Name [0.0]
We model the relationship between characters in a name and race and ethnicity using various techniques.
A model using Long Short-Term Memory works best with out-of-sample accuracy of.85.
The best-performing last-name model achieves out-of-sample accuracy of.81.
arXiv Detail & Related papers (2018-05-05T20:04:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.