Enhancing Clinical Predictive Modeling through Model Complexity-Driven
Class Proportion Tuning for Class Imbalanced Data: An Empirical Study on
Opioid Overdose Prediction
- URL: http://arxiv.org/abs/2305.05722v1
- Date: Tue, 9 May 2023 19:14:01 GMT
- Title: Enhancing Clinical Predictive Modeling through Model Complexity-Driven
Class Proportion Tuning for Class Imbalanced Data: An Empirical Study on
Opioid Overdose Prediction
- Authors: Yinan Liu, Xinyu Dong, Weimin Lyu, Richard N. Rosenthal, Rachel Wong,
Tengfei Ma and Fusheng Wang
- Abstract summary: Class imbalance problems widely exist in the medical field and heavily deteriorates performance of clinical predictive models.
Most techniques to alleviate the problem rebalance class proportions and they predominantly assume the rebalanced proportions should be a function of the original data and oblivious to the model one uses.
This work challenges this prevailing assumption and proposes that links the optimal class proportions to the model complexity, thereby tuning the class proportions per model.
- Score: 7.13083572359416
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Class imbalance problems widely exist in the medical field and heavily
deteriorates performance of clinical predictive models. Most techniques to
alleviate the problem rebalance class proportions and they predominantly assume
the rebalanced proportions should be a function of the original data and
oblivious to the model one uses. This work challenges this prevailing
assumption and proposes that links the optimal class proportions to the model
complexity, thereby tuning the class proportions per model. Our experiments on
the opioid overdose prediction problem highlight the performance gain of tuning
class proportions. Rigorous regression analysis also confirms the advantages of
the theoretical framework proposed and the statistically significant
correlation between the hyperparameters controlling the model complexity and
the optimal class proportions.
Related papers
- An Experimental Study on the Rashomon Effect of Balancing Methods in Imbalanced Classification [0.0]
This paper examines the impact of balancing methods on predictive multiplicity using the Rashomon effect.
It is crucial because the blind model selection in data-centric AI is risky from a set of approximately equally accurate models.
arXiv Detail & Related papers (2024-03-22T13:08:22Z) - MCRAGE: Synthetic Healthcare Data for Fairness [3.0089659534785853]
We propose Minority Class Rebalancing through Augmentation by Generative modeling (MCRAGE) to augment imbalanced datasets.
MCRAGE involves training a Denoising Diffusion Probabilistic Model (CDDPM) capable of generating high-quality synthetic EHR samples from underrepresented classes.
We use this synthetic data to augment the existing imbalanced dataset, resulting in a more balanced distribution across all classes.
arXiv Detail & Related papers (2023-10-27T19:02:22Z) - Incoporating Weighted Board Learning System for Accurate Occupational
Pneumoconiosis Staging [2.6279558928497218]
Occupational pneumoconiosis (OP) staging is a vital task concerning the lung healthy of a subject.
The staging result of a patient is depended on the staging standard and his chest X-ray.
The distribution of OP data is commonly imbalanced, which largely reduces the effect of classification models.
arXiv Detail & Related papers (2022-08-13T09:31:25Z) - Density-Aware Personalized Training for Risk Prediction in Imbalanced
Medical Data [89.79617468457393]
Training models with imbalance rate (class density discrepancy) may lead to suboptimal prediction.
We propose a framework for training models for this imbalance issue.
We demonstrate our model's improved performance in real-world medical datasets.
arXiv Detail & Related papers (2022-07-23T00:39:53Z) - On how to avoid exacerbating spurious correlations when models are
overparameterized [33.315813572333745]
We show that VS-loss learns a model that is fair towards minorities even when spurious features are strong.
Compared to previous works, our bounds hold for more general models, they are non-asymptotic, and, they apply even at scenarios of extreme imbalance.
arXiv Detail & Related papers (2022-06-25T21:53:44Z) - A Relational Model for One-Shot Classification [80.77724423309184]
We show that a deep learning model with built-in inductive bias can bring benefits to sample-efficient learning, without relying on extensive data augmentation.
The proposed one-shot classification model performs relational matching of a pair of inputs in the form of local and pairwise attention.
arXiv Detail & Related papers (2021-11-08T07:53:12Z) - A Twin Neural Model for Uplift [59.38563723706796]
Uplift is a particular case of conditional treatment effect modeling.
We propose a new loss function defined by leveraging a connection with the Bayesian interpretation of the relative risk.
We show our proposed method is competitive with the state-of-the-art in simulation setting and on real data from large scale randomized experiments.
arXiv Detail & Related papers (2021-05-11T16:02:39Z) - A non-asymptotic penalization criterion for model selection in mixture
of experts models [1.491109220586182]
We consider the Gaussian-gated localized MoE (GLoME) regression model for modeling heterogeneous data.
This model poses challenging questions with respect to the statistical estimation and model selection problems.
We study the problem of estimating the number of components of the GLoME model, in a penalized maximum likelihood estimation framework.
arXiv Detail & Related papers (2021-04-06T16:24:55Z) - Bayesian prognostic covariate adjustment [59.75318183140857]
Historical data about disease outcomes can be integrated into the analysis of clinical trials in many ways.
We build on existing literature that uses prognostic scores from a predictive model to increase the efficiency of treatment effect estimates.
arXiv Detail & Related papers (2020-12-24T05:19:03Z) - Increasing the efficiency of randomized trial estimates via linear
adjustment for a prognostic score [59.75318183140857]
Estimating causal effects from randomized experiments is central to clinical research.
Most methods for historical borrowing achieve reductions in variance by sacrificing strict type-I error rate control.
arXiv Detail & Related papers (2020-12-17T21:10:10Z) - Double Robust Representation Learning for Counterfactual Prediction [68.78210173955001]
We propose a novel scalable method to learn double-robust representations for counterfactual predictions.
We make robust and efficient counterfactual predictions for both individual and average treatment effects.
The algorithm shows competitive performance with the state-of-the-art on real world and synthetic data.
arXiv Detail & Related papers (2020-10-15T16:39:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.