NY Real Estate Racial Equity Analysis via Applied Machine Learning
- URL: http://arxiv.org/abs/2505.16946v3
- Date: Thu, 26 Jun 2025 17:32:06 GMT
- Title: NY Real Estate Racial Equity Analysis via Applied Machine Learning
- Authors: Sanjana Chalavadi, Andrei Pastor, Terry Leitch,
- Abstract summary: This study analyzes tract-level real estate ownership patterns in New York State (NYS) and New York City (NYC) to uncover racial disparities.<n>We use an advanced race/ethnicity imputation model to compare the predicted racial composition of property owners to the resident population from census data.<n>The results reveal significant inequities: White individuals hold a disproportionate share of properties and property value relative to their population, while Black, Hispanic, and Asian communities are underrepresented as property owners.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This study analyzes tract-level real estate ownership patterns in New York State (NYS) and New York City (NYC) to uncover racial disparities. We use an advanced race/ethnicity imputation model (LSTM+Geo with XGBoost filtering, validated at 89.2% accuracy) to compare the predicted racial composition of property owners to the resident population from census data. We examine both a Full Model (statewide) and a Name-Only LSTM Model (NYC) to assess how incorporating geospatial context affects our predictions and disparity estimates. The results reveal significant inequities: White individuals hold a disproportionate share of properties and property value relative to their population, while Black, Hispanic, and Asian communities are underrepresented as property owners. These disparities are most pronounced in minority-majority neighborhoods, where ownership is predominantly White despite a predominantly non-White population. Corporate ownership (LLCs, trusts, etc.) exacerbates these gaps by reducing owner-occupied opportunities in urban minority communities. We provide a breakdown of ownership vs. population by race for majority-White, -Black, -Hispanic, and -Asian tracts, identify those with extreme ownership disparities, and compare patterns in urban, suburban, and rural contexts. The findings underscore persistent racial inequity in property ownership, reflecting broader historical and socio-economic forces, and highlight the importance of data-driven approaches to address these issues.
Related papers
- Fairness in LLM-Generated Surveys [0.5720786928479238]
Large Language Models (LLMs) excel in text generation and understanding, especially simulating socio-political and economic patterns.<n>This study examines how LLMs perform across diverse populations by analyzing public surveys from Chile and the United States.<n>Political identity and race significantly influence prediction accuracy, while in Chile, gender, education, and religious affiliation play more pronounced roles.
arXiv Detail & Related papers (2025-01-25T23:42:20Z) - Identifying and Mitigating Social Bias Knowledge in Language Models [52.52955281662332]
We propose a novel debiasing approach, Fairness Stamp (FAST), which enables fine-grained calibration of individual social biases.<n>FAST surpasses state-of-the-art baselines with superior debiasing performance.<n>This highlights the potential of fine-grained debiasing strategies to achieve fairness in large language models.
arXiv Detail & Related papers (2024-08-07T17:14:58Z) - The Factuality Tax of Diversity-Intervened Text-to-Image Generation: Benchmark and Fact-Augmented Intervention [61.80236015147771]
We quantify the trade-off between using diversity interventions and preserving demographic factuality in T2I models.
Experiments on DoFaiR reveal that diversity-oriented instructions increase the number of different gender and racial groups.
We propose Fact-Augmented Intervention (FAI) to reflect on verbalized or retrieved factual information about gender and racial compositions of generation subjects in history.
arXiv Detail & Related papers (2024-06-29T09:09:42Z) - Understanding Intrinsic Socioeconomic Biases in Large Language Models [4.276697874428501]
We introduce a novel dataset of one million English sentences to quantify socioeconomic biases.
Our findings reveal pervasive socioeconomic biases in both established models like GPT-2 and state-of-the-art models like Llama 2 and Falcon.
arXiv Detail & Related papers (2024-05-28T23:54:44Z) - Large Language Models are Geographically Biased [47.88767211956144]
We study what Large Language Models (LLMs) know about the world we live in through the lens of geography.
We show various problematic geographic biases, which we define as systemic errors in geospatial predictions.
arXiv Detail & Related papers (2024-02-05T02:32:09Z) - Addressing Racial Bias in Facial Emotion Recognition [1.4896509623302834]
This study focuses on analyzing racial bias by sub-sampling training sets with varied racial distributions.
Our findings indicate that smaller datasets with posed faces improve on both fairness and performance metrics as the simulations approach racial balance.
In larger datasets with greater facial variation, fairness metrics generally remain constant, suggesting that racial balance by itself is insufficient to achieve parity in test performance across different racial groups.
arXiv Detail & Related papers (2023-08-09T03:03:35Z) - Estimating Racial Disparities When Race is Not Observed [3.0931877196387196]
We introduce a new class of models that produce racial disparity estimates by using surnames as an instrumental variable for race.
A validation study based on the North Carolina voter file shows that BISG+BIRDiE reduces error by up to 84% when estimating racial differences in party registration.
We apply the proposed methodology to estimate racial differences in who benefits from the home mortgage interest deduction using individual-level tax data from the U.S. Internal Revenue Service.
arXiv Detail & Related papers (2023-03-05T04:46:16Z) - On Disentangled and Locally Fair Representations [95.6635227371479]
We study the problem of performing classification in a manner that is fair for sensitive groups, such as race and gender.
We learn a locally fair representation, such that, under the learned representation, the neighborhood of each sample is balanced in terms of the sensitive attribute.
arXiv Detail & Related papers (2022-05-05T14:26:50Z) - Measuring and Modeling Neighborhoods [0.0]
We develop an open-source survey instrument that allows respondents to draw their neighborhoods on a map.
We propose a statistical model to analyze how the characteristics of respondents and local areas determine subjective neighborhoods.
arXiv Detail & Related papers (2021-10-26T20:43:41Z) - Statistical discrimination in learning agents [64.78141757063142]
Statistical discrimination emerges in agent policies as a function of both the bias in the training population and of agent architecture.
We show that less discrimination emerges with agents that use recurrent neural networks, and when their training environment has less bias.
arXiv Detail & Related papers (2021-10-21T18:28:57Z) - MugRep: A Multi-Task Hierarchical Graph Representation Learning
Framework for Real Estate Appraisal [57.28018917017665]
We propose a Multi-Task Hierarchical Graph Representation Learning (MugRep) framework for accurate real estate appraisal.
By acquiring and integrating multi-trivial urban data, we first construct a rich feature set to comprehensively profile real estate from multiple perspectives.
An evolving real estate transaction graph and a corresponding event graph convolution module are proposed to incorporate asynchronouslytemporal dependencies among real estate transactions.
arXiv Detail & Related papers (2021-07-12T03:51:44Z) - Magnify Your Population: Statistical Downscaling to Augment the Spatial
Resolution of Socioeconomic Census Data [48.7576911714538]
We present a new statistical downscaling approach to derive fine-scale estimates of key socioeconomic attributes.
For each selected socioeconomic variable, a Random Forest model is trained on the source Census units and then used to generate fine-scale gridded predictions.
As a case study, we apply this method to Census data in the United States, downscaling the selected socioeconomic variables available at the block group level, to a grid of 300 spatial resolution.
arXiv Detail & Related papers (2020-06-23T16:52:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.