Related papers: Estimating Racial Disparities When Race is Not Observed

Estimating Racial Disparities When Race is Not Observed

URL: http://arxiv.org/abs/2303.02580v2
Date: Wed, 17 Apr 2024 00:09:11 GMT
Title: Estimating Racial Disparities When Race is Not Observed
Authors: Cory McCartan, Robin Fisher, Jacob Goldin, Daniel E. Ho, Kosuke Imai,
Abstract summary: We introduce a new class of models that produce racial disparity estimates by using surnames as an instrumental variable for race. A validation study based on the North Carolina voter file shows that BISG+BIRDiE reduces error by up to 84% when estimating racial differences in party registration. We apply the proposed methodology to estimate racial differences in who benefits from the home mortgage interest deduction using individual-level tax data from the U.S. Internal Revenue Service.
Score: 3.0931877196387196
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The estimation of racial disparities in various fields is often hampered by the lack of individual-level racial information. In many cases, the law prohibits the collection of such information to prevent direct racial discrimination. As a result, analysts have frequently adopted Bayesian Improved Surname Geocoding (BISG) and its variants, which combine individual names and addresses with Census data to predict race. Unfortunately, the residuals of BISG are often correlated with the outcomes of interest, generally attenuating estimates of racial disparities. To correct this bias, we propose an alternative identification strategy under the assumption that surname is conditionally independent of the outcome given (unobserved) race, residence location, and other observed characteristics. We introduce a new class of models, Bayesian Instrumental Regression for Disparity Estimation (BIRDiE), that take BISG probabilities as inputs and produce racial disparity estimates by using surnames as an instrumental variable for race. Our estimation method is scalable, making it possible to analyze large-scale administrative data. We also show how to address potential violations of the key identification assumptions. A validation study based on the North Carolina voter file shows that BISG+BIRDiE reduces error by up to 84% when estimating racial differences in party registration. Finally, we apply the proposed methodology to estimate racial differences in who benefits from the home mortgage interest deduction using individual-level tax data from the U.S. Internal Revenue Service. Open-source software is available which implements the proposed methodology.

Related papers

Regression-Based Estimation of Causal Effects in the Presence of Selection Bias and Confounding [52.1068936424622]
We consider the problem of estimating the expected causal effect $E[Y|do(X)]$ for a target variable $Y$ when treatment $X$ is set by intervention. In settings without selection bias or confounding, $E[Y|do(X)] = E[Y|X]$, which can be estimated using standard regression methods. We propose a framework that incorporates both selection bias and confounding.
arXiv Detail & Related papers (2025-03-26T13:43:37Z)
The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Pre-trained Language Models [78.69526166193236]
Pre-trained Language models (PLMs) have been acknowledged to contain harmful information, such as social biases. We propose sc Social Bias Neurons to accurately pinpoint units (i.e., neurons) in a language model that can be attributed to undesirable behavior, such as social bias. As measured by prior metrics from StereoSet, our model achieves a higher degree of fairness while maintaining language modeling ability with low cost.
arXiv Detail & Related papers (2024-06-14T15:41:06Z)
Addressing Discretization-Induced Bias in Demographic Prediction [18.427077352120254]
We show how argmax labeling results in a substantial under-count of African-American voters, by 28.2% points in North Carolina. This bias can have substantial implications in downstream tasks that use such labels. We introduce a $textitjoint optimization$ approach -- and a tractable $textitdata-driven thresholding$ -- that can eliminate this bias.
arXiv Detail & Related papers (2024-05-27T02:22:43Z)
Evaluating the Fairness of Discriminative Foundation Models in Computer Vision [51.176061115977774]
We propose a novel taxonomy for bias evaluation of discriminative foundation models, such as Contrastive Language-Pretraining (CLIP) We then systematically evaluate existing methods for mitigating bias in these models with respect to our taxonomy. Specifically, we evaluate OpenAI's CLIP and OpenCLIP models for key applications, such as zero-shot classification, image retrieval and image captioning.
arXiv Detail & Related papers (2023-10-18T10:32:39Z)
Comment: The Essential Role of Policy Evaluation for the 2020 Census Disclosure Avoidance System [0.0]
boyd and Sarathy, "Differential Perspectives: Epistemic Disconnects Surrounding the US Census Bureau's Use of Differential Privacy" We argue that empirical evaluations of the Census Disclosure Avoidance System failed to recognize how the benchmark data is never a ground truth of population counts. We argue that policy makers must confront a key trade-off between data utility and privacy protection.
arXiv Detail & Related papers (2022-10-15T21:41:54Z)
D-BIAS: A Causality-Based Human-in-the-Loop System for Tackling Algorithmic Bias [57.87117733071416]
We propose D-BIAS, a visual interactive tool that embodies human-in-the-loop AI approach for auditing and mitigating social biases. A user can detect the presence of bias against a group by identifying unfair causal relationships in the causal network. For each interaction, say weakening/deleting a biased causal edge, the system uses a novel method to simulate a new (debiased) dataset.
arXiv Detail & Related papers (2022-08-10T03:41:48Z)
Addressing Census data problems in race imputation via fully Bayesian Improved Surname Geocoding and name supplements [0.0]
We introduce a fully Bayesian Improved Surname Geocoding (fBISG) methodology that accounts for potential measurement error in Census counts. We supplement the Census surname data with additional data on last, first, and middle names taken from the voter files of six Southern states where self-reported race is available. Our empirical validation shows that the fBISG methodology and name supplements significantly improve the accuracy of race imputation across all racial groups, and especially for Asians.
arXiv Detail & Related papers (2022-05-12T14:41:45Z)
Statistical discrimination in learning agents [64.78141757063142]
Statistical discrimination emerges in agent policies as a function of both the bias in the training population and of agent architecture. We show that less discrimination emerges with agents that use recurrent neural networks, and when their training environment has less bias.
arXiv Detail & Related papers (2021-10-21T18:28:57Z)
Avoiding bias when inferring race using name-based approaches [0.8543368663496084]
We use information from the U.S. Census and mortgage applications to infer the race of U.S. affiliated authors in the Web of Science. Our results demonstrate that the validity of name based inference varies by race/ethnicity and that threshold approaches underestimate Black authors and overestimate White authors.
arXiv Detail & Related papers (2021-04-14T08:36:22Z)
Balancing Biases and Preserving Privacy on Balanced Faces in the Wild [50.915684171879036]
There are demographic biases present in current facial recognition (FR) models. We introduce our Balanced Faces in the Wild dataset to measure these biases across different ethnic and gender subgroups. We find that relying on a single score threshold to differentiate between genuine and imposters sample pairs leads to suboptimal results. We propose a novel domain adaptation learning scheme that uses facial features extracted from state-of-the-art neural networks.
arXiv Detail & Related papers (2021-03-16T15:05:49Z)
Magnify Your Population: Statistical Downscaling to Augment the Spatial Resolution of Socioeconomic Census Data [48.7576911714538]
We present a new statistical downscaling approach to derive fine-scale estimates of key socioeconomic attributes. For each selected socioeconomic variable, a Random Forest model is trained on the source Census units and then used to generate fine-scale gridded predictions. As a case study, we apply this method to Census data in the United States, downscaling the selected socioeconomic variables available at the block group level, to a grid of 300 spatial resolution.
arXiv Detail & Related papers (2020-06-23T16:52:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.