Analyzing Geospatial and Socioeconomic Disparities in Breast Cancer Screening Among Populations in the United States: Machine Learning Approach
- URL: http://arxiv.org/abs/2502.06800v1
- Date: Thu, 30 Jan 2025 21:07:34 GMT
- Title: Analyzing Geospatial and Socioeconomic Disparities in Breast Cancer Screening Among Populations in the United States: Machine Learning Approach
- Authors: Soheil Hashtarkhani, Yiwang Zhou, Fekede Asefa Kumsa, Shelley White-Means, David L Schwartz, Arash Shaban-Nejad,
- Abstract summary: This study aims to assess breast cancer screening rates nationwide in the United States.
Data on mammography screening at the census tract level for 2018 and 2020 were collected.
We developed a large dataset of social determinants of health, comprising 13 variables for 72337 census tracts.
- Score: 0.3958317527488535
- License:
- Abstract: Breast cancer screening plays a pivotal role in early detection and subsequent effective management of the disease, impacting patient outcomes and survival rates. This study aims to assess breast cancer screening rates nationwide in the United States and investigate the impact of social determinants of health on these screening rates. Data on mammography screening at the census tract level for 2018 and 2020 were collected from the Behavioral Risk Factor Surveillance System. We developed a large dataset of social determinants of health, comprising 13 variables for 72337 census tracts. Spatial analysis employing Getis-Ord Gi statistics was used to identify clusters of high and low breast cancer screening rates. To evaluate the influence of these social determinants, we implemented a random forest model, with the aim of comparing its performance to linear regression and support vector machine models. The models were evaluated using R2 and root mean squared error metrics. Shapley Additive Explanations values were subsequently used to assess the significance of variables and direction of their influence. Geospatial analysis revealed elevated screening rates in the eastern and northern United States, while central and midwestern regions exhibited lower rates. The random forest model demonstrated superior performance, with an R2=64.53 and root mean squared error of 2.06 compared to linear regression and support vector machine models. Shapley Additive Explanations values indicated that the percentage of the Black population, the number of mammography facilities within a 10-mile radius, and the percentage of the population with at least a bachelor's degree were the most influential variables, all positively associated with mammography screening rates.
Related papers
- Evaluating Spoken Language as a Biomarker for Automated Screening of Cognitive Impairment [37.40606157690235]
Alterations in speech and language can be early predictors of Alzheimer's disease and related dementias.
We evaluated machine learning techniques for ADRD screening and severity prediction from spoken language.
Risk stratification and linguistic feature importance analysis enhanced the interpretability and clinical utility of predictions.
arXiv Detail & Related papers (2025-01-30T20:17:17Z) - Predicting Breast Cancer Survival: A Survival Analysis Approach Using Log Odds and Clinical Variables [0.0]
This study employs survival analysis techniques, including Cox proportional hazards and parametric survival models, to enhance the prediction of the log odds of survival in breast cancer patients.
Data from 1557 breast cancer patients were obtained from a publicly available dataset provided by the University College Hospital, Ibadan, Nigeria.
arXiv Detail & Related papers (2024-10-17T10:01:22Z) - Region-specific Risk Quantification for Interpretable Prognosis of COVID-19 [36.731054010197035]
The COVID-19 pandemic has strained global public health, necessitating accurate diagnosis and intervention to control disease spread and reduce mortality rates.
This paper introduces an interpretable deep survival prediction model designed specifically for improved understanding and trust in COVID-19 prognosis using chest X-ray (CXR) images.
arXiv Detail & Related papers (2024-05-05T05:08:38Z) - Using Pre-training and Interaction Modeling for ancestry-specific disease prediction in UK Biobank [69.90493129893112]
Recent genome-wide association studies (GWAS) have uncovered the genetic basis of complex traits, but show an under-representation of non-European descent individuals.
Here, we assess whether we can improve disease prediction across diverse ancestries using multiomic data.
arXiv Detail & Related papers (2024-04-26T16:39:50Z) - An AI-Guided Data Centric Strategy to Detect and Mitigate Biases in
Healthcare Datasets [32.25265709333831]
We generate a data-centric, model-agnostic, task-agnostic approach to evaluate dataset bias by investigating the relationship between how easily different groups are learned at small sample sizes (AEquity)
We then apply a systematic analysis of AEq values across subpopulations to identify and manifestations of racial bias in two known cases in healthcare.
AEq is a novel and broadly applicable metric that can be applied to advance equity by diagnosing and remediating bias in healthcare datasets.
arXiv Detail & Related papers (2023-11-06T17:08:41Z) - Agent-Based Model: Simulating a Virus Expansion Based on the Acceptance
of Containment Measures [65.62256987706128]
Compartmental epidemiological models categorize individuals based on their disease status.
We propose an ABM architecture that combines an adapted SEIRD model with a decision-making model for citizens.
We illustrate the designed model by examining the progression of SARS-CoV-2 infections in A Coruna, Spain.
arXiv Detail & Related papers (2023-07-28T08:01:05Z) - Supervised Machine Learning for Breast Cancer Risk Factors Analysis and
Survival Prediction [0.5249805590164902]
The choice of the most effective treatment may eventually be influenced by breast cancer survival prediction.
In this study, 1904 patient records were utilized to predict a 5-year breast cancer survival using a machine learning approach.
arXiv Detail & Related papers (2023-04-13T12:32:14Z) - Penalized Deep Partially Linear Cox Models with Application to CT Scans
of Lung Cancer Patients [42.09584755334577]
Lung cancer is a leading cause of cancer mortality globally, highlighting the importance of understanding its mortality risks to design effective therapies.
The National Lung Screening Trial (NLST) employed computed tomography texture analysis to quantify the mortality risks of lung cancer patients.
We propose a novel Penalized Deep Partially Linear Cox Model (Penalized DPLC), which incorporates the SCAD penalty to select important texture features and employs a deep neural network to estimate the nonparametric component of the model.
arXiv Detail & Related papers (2023-03-09T15:38:16Z) - Applying Machine Learning and AI Explanations to Analyze Vaccine
Hesitancy [0.0]
The paper quantifies the impact of race, poverty, politics, and age on vaccination rates in U.S. counties.
It is apparent that the influence of impact factors is not universally the same across different geographies.
arXiv Detail & Related papers (2022-01-07T22:50:17Z) - Comparative Analysis of Machine Learning Approaches to Analyze and
Predict the Covid-19 Outbreak [10.307715136465056]
We present a comparative analysis of various machine learning (ML) approaches in predicting the COVID-19 outbreak in the epidemiological domain.
The results reveal the advantages of ML algorithms for supporting decision making of evolving short term policies.
arXiv Detail & Related papers (2021-02-11T11:57:33Z) - Learning from Suspected Target: Bootstrapping Performance for Breast
Cancer Detection in Mammography [6.323318523772466]
We introduce a novel top likelihood loss together with a new sampling procedure to select and train the suspected target regions.
We firstly test our proposed method on a private dense mammogram dataset.
Results show that our proposed method greatly reduce the false positive rate and the specificity is increased by 0.25 on detecting mass type cancer.
arXiv Detail & Related papers (2020-03-01T09:04:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.