A Micro-Macro Machine Learning Framework for Predicting Childhood Obesity Risk Using NHANES and Environmental Determinants
- URL: http://arxiv.org/abs/2512.22758v1
- Date: Sun, 28 Dec 2025 03:20:04 GMT
- Title: A Micro-Macro Machine Learning Framework for Predicting Childhood Obesity Risk Using NHANES and Environmental Determinants
- Authors: Eswarasanthosh Kumar Mamillapalli, Nishtha Sharma,
- Abstract summary: Childhood obesity remains a major public health challenge in the United States.<n>We introduce a micro-macro machine learning framework that integrates individual-level anthropometric and socioeconomic data.<n>Four machine learning models were trained to predict obesity using NHANES microdata.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Childhood obesity remains a major public health challenge in the United States, strongly influenced by a combination of individual-level, household-level, and environmental-level risk factors. Traditional epidemiological studies typically analyze these levels independently, limiting insights into how structural environmental conditions interact with individual-level characteristics to influence health outcomes. In this study, we introduce a micro-macro machine learning framework that integrates (1) individual-level anthropometric and socioeconomic data from NHANES and (2) macro-level structural environment features, including food access, air quality, and socioeconomic vulnerability extracted from USDA and EPA datasets. Four machine learning models Logistic Regression, Random Forest, XGBoost, and LightGBM were trained to predict obesity using NHANES microdata. XGBoost achieved the strongest performance. A composite environmental vulnerability index (EnvScore) was constructed using normalized indicators from USDA and EPA at the state level. Multi-level comparison revealed strong geographic similarity between states with high environmental burden and the nationally predicted micro-level obesity risk distribution. This demonstrates the feasibility of integrating multi-scale datasets to identify environment-driven disparities in obesity risk. This work contributes a scalable, data-driven, multi-level modeling pipeline suitable for public health informatics, demonstrating strong potential for expansion into causal modeling, intervention planning, and real-time analytics.
Related papers
- Interpretable Graph-Language Modeling for Detecting Youth Illicit Drug Use [51.696842592898804]
Illicit drug use among teenagers and young adults (TYAs) remains a pressing public health concern.<n>To detect illicit drug use among TYAs, researchers analyze large-scale surveys such as the Youth Risk Behavior Survey (YRBS) and the National Survey on Drug Use and Health (NSDUH)<n>Existing modeling methods treat survey variables independently, overlooking latent and interconnected structures among them.<n>We propose LAMI, a novel joint graph-language modeling framework for detecting illicit drug use and interpreting behavioral risk factors among TYAs.
arXiv Detail & Related papers (2025-10-11T17:29:50Z) - Adaptable Cardiovascular Disease Risk Prediction from Heterogeneous Data using Large Language Models [70.64969663547703]
AdaCVD is an adaptable CVD risk prediction framework built on large language models extensively fine-tuned on over half a million participants from the UK Biobank.<n>It addresses key clinical challenges across three dimensions: it flexibly incorporates comprehensive yet variable patient information; it seamlessly integrates both structured data and unstructured text; and it rapidly adapts to new patient populations using minimal additional data.
arXiv Detail & Related papers (2025-05-30T14:42:02Z) - Improving Diseases Predictions Utilizing External Bio-Banks [1.9336815376402723]
We demonstrate how machine learning can be leveraged to enhance explainability and uncover biologically meaningful associations.<n>We train LightGBM models from scratch on our dataset (10K) to impute metabolomics features.<n>The imputed metabolomics features are then used in survival analysis to assess their impact on disease-related risk factors.
arXiv Detail & Related papers (2025-03-30T13:05:20Z) - How Your Location Relates to Health: Variable Importance and Interpretable Machine Learning for Environmental and Sociodemographic Data [15.463748602675695]
Health outcomes depend on complex environmental and sociodemographic factors whose effects change over location and time.<n>We use fine-grained spatial and temporal data to study these effects, namely the MEDSAT dataset of English health, environmental, and sociodemographic information.<n>We then develop an interpretable machine learning framework based on Generalized Additive Models (GAMs) and Multiscale Geographically Weighted Regression (MGWR)<n>Our findings identify NO2 as a global predictor for asthma, hypertension, and anxiety, alongside other outcome-specific predictors related to occupation, marriage, and vegetation.
arXiv Detail & Related papers (2025-01-03T21:34:35Z) - Impact on Public Health Decision Making by Utilizing Big Data Without
Domain Knowledge [17.73578632982445]
New data sources, and artificial intelligence (AI) methods are becoming plentiful, and relevant to decision making in many societal applications.
This work illustrates important issues of robustness and model specification for informing effective allocation of interventions using new data sources.
arXiv Detail & Related papers (2024-02-08T21:03:34Z) - Agent-Based Model: Simulating a Virus Expansion Based on the Acceptance
of Containment Measures [65.62256987706128]
Compartmental epidemiological models categorize individuals based on their disease status.
We propose an ABM architecture that combines an adapted SEIRD model with a decision-making model for citizens.
We illustrate the designed model by examining the progression of SARS-CoV-2 infections in A Coruna, Spain.
arXiv Detail & Related papers (2023-07-28T08:01:05Z) - A Comparative Study of Machine Learning Algorithms for Anomaly Detection
in Industrial Environments: Performance and Environmental Impact [62.997667081978825]
This study seeks to address the demands of high-performance machine learning models with environmental sustainability.
Traditional machine learning algorithms, such as Decision Trees and Random Forests, demonstrate robust efficiency and performance.
However, superior outcomes were obtained with optimised configurations, albeit with a commensurate increase in resource consumption.
arXiv Detail & Related papers (2023-07-01T15:18:00Z) - Label scarcity in biomedicine: Data-rich latent factor discovery
enhances phenotype prediction [102.23901690661916]
Low-dimensional embedding spaces can be derived from the UK Biobank population dataset to enhance data-scarce prediction of health indicators, lifestyle and demographic characteristics.
Performances gains from semisupervison approaches will probably become an important ingredient for various medical data science applications.
arXiv Detail & Related papers (2021-10-12T16:25:50Z) - Health Status Prediction with Local-Global Heterogeneous Behavior Graph [69.99431339130105]
Estimation of health status can be achieved with various kinds of data streams continuously collected from wearable sensors.
We propose to model the behavior-related multi-source data streams with a local-global graph.
We take experiments on StudentLife dataset, and extensive results demonstrate the effectiveness of our proposed model.
arXiv Detail & Related papers (2021-03-23T11:10:04Z) - Inferring the Spatial Distribution of Physical Activity in Children
Population from Characteristics of the Environment [5.16880858963093]
We propose a novel analysis approach for modeling the expected population behavior as a function of the local environment.
We experimentally evaluate this approach in predicting the expected physical activity level in small geographic regions.
Specifically, we train models that predict the physical activity level in a region, achieving 81% leave-one-out accuracy.
arXiv Detail & Related papers (2020-05-08T11:07:35Z) - BigO: A public health decision support system for measuring obesogenic
behaviors of children in relation to their local environment [3.1617908029688913]
BigO is a system designed to collect objective behavioral data from children and adolescent populations as well as their environment.
We present an overview of the data acquisition, indicator extraction, data exploration and analysis components of the BigO system.
arXiv Detail & Related papers (2020-05-06T16:06:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.