Related papers: Data Augmentation for Classification of Negative Pregnancy Outcomes in Imbalanced Data

Data Augmentation for Classification of Negative Pregnancy Outcomes in Imbalanced Data

URL: http://arxiv.org/abs/2512.22732v1
Date: Sun, 28 Dec 2025 00:22:13 GMT
Title: Data Augmentation for Classification of Negative Pregnancy Outcomes in Imbalanced Data
Authors: Md Badsha Biswas,
Abstract summary: This paper introduces a novel approach that uses publicly available social media data, especially from platforms like Twitter, to enhance current datasets for studying negative pregnancy outcomes through observational research.<n>By constructing a natural language processing (NLP) pipeline, we aim to automatically identify women sharing their pregnancy experiences, categorizing them based on reported outcomes.<n>This study offers potential applications in assessing the causal impact of specific interventions, treatments, or prenatal exposures on maternal and fetal health outcomes.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Infant mortality remains a significant public health concern in the United States, with birth defects identified as a leading cause. Despite ongoing efforts to understand the causes of negative pregnancy outcomes like miscarriage, stillbirths, birth defects, and premature birth, there is still a need for more comprehensive research and strategies for intervention. This paper introduces a novel approach that uses publicly available social media data, especially from platforms like Twitter, to enhance current datasets for studying negative pregnancy outcomes through observational research. The inherent challenges in utilizing social media data, including imbalance, noise, and lack of structure, necessitate robust preprocessing techniques and data augmentation strategies. By constructing a natural language processing (NLP) pipeline, we aim to automatically identify women sharing their pregnancy experiences, categorizing them based on reported outcomes. Women reporting full gestation and normal birth weight will be classified as positive cases, while those reporting negative pregnancy outcomes will be identified as negative cases. Furthermore, this study offers potential applications in assessing the causal impact of specific interventions, treatments, or prenatal exposures on maternal and fetal health outcomes. Additionally, it provides a framework for future health studies involving pregnant cohorts and comparator groups. In a broader context, our research showcases the viability of social media data as an adjunctive resource in epidemiological investigations about pregnancy outcomes.

Related papers

Assessing the robustness of heterogeneous treatment effects in survival analysis under informative censoring [50.164756034797136]
Dropout is common in clinical studies, with up to half of patients leaving early due to side effects or other reasons.<n>When dropout is informative, it introduces censoring bias, because of which treatment effect estimates are also biased.<n>We propose an assumption-lean framework to assess the robustness of conditional average treatment effect estimates in survival analysis when facing censoring bias.
arXiv Detail & Related papers (2025-10-15T10:51:17Z)
Do-PFN: In-Context Learning for Causal Effect Estimation [75.62771416172109]
We show that Prior-data fitted networks (PFNs) can be pre-trained on synthetic data to predict outcomes.<n>Our approach allows for the accurate estimation of causal effects without knowledge of the underlying causal graph.
arXiv Detail & Related papers (2025-06-06T12:43:57Z)
Data Fusion for Partial Identification of Causal Effects [62.56890808004615]
We propose a novel partial identification framework that enables researchers to answer key questions.<n>Is the causal effect positive or negative? and How severe must assumption violations be to overturn this conclusion?<n>We apply our framework to the Project STAR study, which investigates the effect of classroom size on students' third-grade standardized test performance.
arXiv Detail & Related papers (2025-05-30T07:13:01Z)
Predicting Fetal Birthweight from High Dimensional Data using Advanced Machine Learning [1.489994236178479]
Birth weight serves as a fundamental indicator of neonatal health, closely linked to medical interventions and long-term developmental risks.<n>Traditional predictive models, often constrained by limited feature selection and incomplete datasets, struggle to achieve overlooking complex maternal and fetal interactions.<n>This research explores machine learning to address these limitations, utilizing a structured methodology that integrates advanced imputation strategies.
arXiv Detail & Related papers (2025-02-20T05:17:39Z)
I-SIRch: AI-Powered Concept Annotation Tool For Equitable Extraction And Analysis Of Safety Insights From Maternity Investigations [0.8609957371651683]
Most current tools for analysing healthcare data focus only on biomedical concepts, overlooking the importance of human factors. We developed I-SIRch, using artificial intelligence to automatically identify and label human factors concepts. I-SIRch was trained using real data and tested on both real and simulated data to evaluate its performance in identifying human factors concepts.
arXiv Detail & Related papers (2024-06-08T16:05:31Z)
Impact of Physical Activity on Quality of Life During Pregnancy: A Causal ML Approach [1.7765306045990206]
The concept of Quality of Life (QoL) refers to a holistic measurement of an individual's well-being, incorporating psychological and social aspects. Pregnant women, especially those with obesity and stress, often experience lower QoL. Physical activity has shown the potential to enhance the QoL. However, pregnant women who are overweight and obese rarely meet the recommended level of PA.
arXiv Detail & Related papers (2024-02-25T12:07:32Z)
Unveiling the Unborn: Advancing Fetal Health Classification through Machine Learning [0.0]
This research paper presents a novel machine-learning approach for fetal health classification. The proposed model achieves an impressive accuracy of 98.31% on a test set. By incorporating multiple data points, our model offers a more holistic and reliable evaluation.
arXiv Detail & Related papers (2023-09-30T22:02:51Z)
Predicting Adverse Neonatal Outcomes for Preterm Neonates with Multi-Task Learning [51.487856868285995]
We first analyze the correlations between three adverse neonatal outcomes and then formulate the diagnosis of multiple neonatal outcomes as a multi-task learning (MTL) problem. In particular, the MTL framework contains shared hidden layers and multiple task-specific branches.
arXiv Detail & Related papers (2023-03-28T00:44:06Z)
Data-Centric Epidemic Forecasting: A Survey [56.99209141838794]
This survey delves into various data-driven methodological and practical advancements. We enumerate the large number of epidemiological datasets and novel data streams that are relevant to epidemic forecasting. We also discuss experiences and challenges that arise in real-world deployment of these forecasting systems.
arXiv Detail & Related papers (2022-07-19T16:15:11Z)
The Influences of Pre-birth Factors in Early Assessment of Child Mortality using Machine Learning Techniques [0.4817429789586127]
This study aims at incorporating pre-birth factors, such as birth history, maternal history, reproduction history, socioeconomic condition, etc., for classifying child mortality. Four machine learning algorithms are evaluated for classifying child mortality. Results show that the proposed approach achieved an AUC score of 0.947 in classifying child mortality which outperformed the clinical standards.
arXiv Detail & Related papers (2020-11-18T20:37:55Z)
Enabling Counterfactual Survival Analysis with Balanced Representations [64.17342727357618]
Survival data are frequently encountered across diverse medical applications, i.e., drug development, risk profiling, and clinical trials. We propose a theoretically grounded unified framework for counterfactual inference applicable to survival outcomes.
arXiv Detail & Related papers (2020-06-14T01:15:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.