Semiparametric count data regression for self-reported mental health
- URL: http://arxiv.org/abs/2106.09114v1
- Date: Wed, 16 Jun 2021 20:38:13 GMT
- Title: Semiparametric count data regression for self-reported mental health
- Authors: Daniel R. Kowal and Bohan Wu
- Abstract summary: We design a semiparametric estimation and inference framework for count data regression.
The data-generating process is defined by simultaneously transforming and rounding (STAR) a latent Gaussian regression model.
STAR is deployed to study the factors associated with self-reported mental health and demonstrates substantial improvements in goodness-of-fit compared to existing count data regression models.
- Score: 0.3553493344868413
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: "For how many days during the past 30 days was your mental health not good?"
The responses to this question measure self-reported mental health and can be
linked to important covariates in the National Health and Nutrition Examination
Survey (NHANES). However, these count variables present major distributional
challenges: the data are overdispersed, zero-inflated, bounded by 30, and
heaped in five- and seven-day increments. To meet these challenges, we design a
semiparametric estimation and inference framework for count data regression.
The data-generating process is defined by simultaneously transforming and
rounding (STAR) a latent Gaussian regression model. The transformation is
estimated nonparametrically and the rounding operator ensures the correct
support for the discrete and bounded data. Maximum likelihood estimators are
computed using an EM algorithm that is compatible with any continuous data
model estimable by least squares. STAR regression includes asymptotic
hypothesis testing and confidence intervals, variable selection via information
criteria, and customized diagnostics. Simulation studies validate the utility
of this framework. STAR is deployed to study the factors associated with
self-reported mental health and demonstrates substantial improvements in
goodness-of-fit compared to existing count data regression models.
Related papers
- Interval Estimation of Coefficients in Penalized Regression Models of Insurance Data [3.5637073151604093]
Tweedie exponential dispersion family is a popular choice among many to model insurance losses.
It is often important to obtain credibility (inference) of the most important features that describe the endogenous variables.
arXiv Detail & Related papers (2024-10-01T18:57:18Z) - Risk and cross validation in ridge regression with correlated samples [72.59731158970894]
We provide training examples for the in- and out-of-sample risks of ridge regression when the data points have arbitrary correlations.
We further extend our analysis to the case where the test point has non-trivial correlations with the training set, setting often encountered in time series forecasting.
We validate our theory across a variety of high dimensional data.
arXiv Detail & Related papers (2024-08-08T17:27:29Z) - Bayesian Federated Inference for regression models based on non-shared multicenter data sets from heterogeneous populations [0.0]
In a regression model, the sample size must be large enough relative to the number of possible predictors.
Pooling data from different data sets collected in different (medical) centers would alleviate this problem, but is often not feasible due to privacy regulation or logistic problems.
An alternative route would be to analyze the local data in the centers separately and combine the statistical inference results with the Bayesian Federated Inference (BFI) methodology.
The aim of this approach is to compute from the inference results in separate centers what would have been found if the statistical analysis was performed on the combined data.
arXiv Detail & Related papers (2024-02-05T11:10:27Z) - Selective Nonparametric Regression via Testing [54.20569354303575]
We develop an abstention procedure via testing the hypothesis on the value of the conditional variance at a given point.
Unlike existing methods, the proposed one allows to account not only for the value of the variance itself but also for the uncertainty of the corresponding variance predictor.
arXiv Detail & Related papers (2023-09-28T13:04:11Z) - Performance Evaluation of Regression Models in Predicting the Cost of
Medical Insurance [0.0]
Three (3) Regression Models in Machine Learning namely Linear Regression, Gradient Boosting, and Support Vector Machine were used.
The performance will be evaluated using the metrics RMSE (Root Mean Square), r2 (R Square), and K-Fold Cross-validation.
arXiv Detail & Related papers (2023-04-25T06:33:49Z) - Continuous-Time Modeling of Counterfactual Outcomes Using Neural
Controlled Differential Equations [84.42837346400151]
Estimating counterfactual outcomes over time has the potential to unlock personalized healthcare.
Existing causal inference approaches consider regular, discrete-time intervals between observations and treatment decisions.
We propose a controllable simulation environment based on a model of tumor growth for a range of scenarios.
arXiv Detail & Related papers (2022-06-16T17:15:15Z) - Near-optimal inference in adaptive linear regression [60.08422051718195]
Even simple methods like least squares can exhibit non-normal behavior when data is collected in an adaptive manner.
We propose a family of online debiasing estimators to correct these distributional anomalies in at least squares estimation.
We demonstrate the usefulness of our theory via applications to multi-armed bandit, autoregressive time series estimation, and active learning with exploration.
arXiv Detail & Related papers (2021-07-05T21:05:11Z) - Statistical Inference for High-Dimensional Linear Regression with
Blockwise Missing Data [13.48481978963297]
Blockwise missing data occurs when we integrate multisource or multimodality data where different sources or modalities contain complementary information.
We propose a computationally efficient estimator for the regression coefficient vector based on carefully constructed unbiased estimating equations.
Numerical studies and application analysis of the Alzheimer's Disease Neuroimaging Initiative data show that the proposed method performs better and benefits more from unsupervised samples than existing methods.
arXiv Detail & Related papers (2021-06-07T05:12:42Z) - Flexible Model Aggregation for Quantile Regression [92.63075261170302]
Quantile regression is a fundamental problem in statistical learning motivated by a need to quantify uncertainty in predictions.
We investigate methods for aggregating any number of conditional quantile models.
All of the models we consider in this paper can be fit using modern deep learning toolkits.
arXiv Detail & Related papers (2021-02-26T23:21:16Z) - Increasing the efficiency of randomized trial estimates via linear
adjustment for a prognostic score [59.75318183140857]
Estimating causal effects from randomized experiments is central to clinical research.
Most methods for historical borrowing achieve reductions in variance by sacrificing strict type-I error rate control.
arXiv Detail & Related papers (2020-12-17T21:10:10Z) - Evaluating Model Robustness and Stability to Dataset Shift [7.369475193451259]
We propose a framework for analyzing stability of machine learning models.
We use the original evaluation data to determine distributions under which the algorithm performs poorly.
We estimate the algorithm's performance on the "worst-case" distribution.
arXiv Detail & Related papers (2020-10-28T17:35:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.