Adapting tree-based multiple imputation methods for multi-level data? A
simulation study
- URL: http://arxiv.org/abs/2401.14161v1
- Date: Thu, 25 Jan 2024 13:12:50 GMT
- Title: Adapting tree-based multiple imputation methods for multi-level data? A
simulation study
- Authors: Ketevan Gurtskaia, Jakob Schwerter and Philipp Doebler
- Abstract summary: This simulation study evaluates the effectiveness of multiple imputation techniques for multilevel data.
It compares the performance of traditional Multiple Imputation by Chained Equations (MICE) with tree-based methods.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This simulation study evaluates the effectiveness of multiple imputation (MI)
techniques for multilevel data. It compares the performance of traditional
Multiple Imputation by Chained Equations (MICE) with tree-based methods such as
Chained Random Forests with Predictive Mean Matching and Extreme Gradient
Boosting. Adapted versions that include dummy variables for cluster membership
are also included for the tree-based methods. Methods are evaluated for
coefficient estimation bias, statistical power, and type I error rates on
simulated hierarchical data with different cluster sizes (25 and 50) and levels
of missingness (10\% and 50\%). Coefficients are estimated using random
intercept and random slope models. The results show that while MICE is
preferred for accurate rejection rates, Extreme Gradient Boosting is
advantageous for reducing bias. Furthermore, the study finds that bias levels
are similar across different cluster sizes, but rejection rates tend to be less
favorable with fewer clusters (lower power, higher type I error). In addition,
the inclusion of cluster dummies in tree-based methods improves estimation for
Level 1 variables, but is less effective for Level 2 variables. When data
become too complex and MICE is too slow, extreme gradient boosting is a good
alternative for hierarchical data.
Keywords: Multiple imputation; multi-level data; MICE; missRanger; mixgb
Related papers
- Efficient Multi-Agent System Training with Data Influence-Oriented Tree Search [59.75749613951193]
We propose Data Influence-oriented Tree Search (DITS) to guide both tree search and data selection.
By leveraging influence scores, we effectively identify the most impactful data for system improvement.
We derive influence score estimation methods tailored for non-differentiable metrics.
arXiv Detail & Related papers (2025-02-02T23:20:16Z) - Mitigating covariate shift in non-colocated data with learned parameter priors [0.0]
We present textitFragmentation-induced co-shift remediation ($FIcsR$), which minimizes an $f$-divergence between a fragment's covariate distribution and that of the standard cross-validation baseline.
We run extensive classification experiments on multiple data classes, over $40$ datasets, and with data batched over multiple sequence lengths.
The results are promising under all these conditions; with improved accuracy against batch and fold state-of-the-art by more than $5%$ and $10%$, respectively.
arXiv Detail & Related papers (2024-11-10T15:48:29Z) - Unmasking Trees for Tabular Data [0.0]
We present UnmaskingTrees, a simple method for tabular imputation (and generation) employing gradient-boosted decision trees.
To solve the conditional generation subproblem, we propose BaltoBot, which fits a balanced tree of boosted tree classifiers.
Unlike older methods, it requires no parametric assumption on the conditional distribution, accommodating features with multimodal distributions.
We finally consider our two approaches as meta-algorithms, demonstrating in-context learning-based generative modeling with TabPFN.
arXiv Detail & Related papers (2024-07-08T04:15:43Z) - BooleanOCT: Optimal Classification Trees based on multivariate Boolean
Rules [14.788278997556606]
We introduce a new mixed-integer programming (MIP) formulation to derive the optimal classification tree.
Our methodology integrates both linear metrics, including accuracy, balanced accuracy, and cost-sensitive cost, as well as nonlinear metrics such as the F1-score.
The proposed models demonstrate practical solvability on real-world datasets, effectively handling sizes in the tens of thousands.
arXiv Detail & Related papers (2024-01-29T12:58:44Z) - Evaluating tree-based imputation methods as an alternative to MICE PMM
for drawing inference in empirical studies [0.5892638927736115]
Dealing with missing data is an important problem in statistical analysis that is often addressed with imputation procedures.
The prevailing method of Multiple Imputation by Chained Equations with Predictive Mean Matching (PMM) is considered standard in the social science literature.
In particular, tree-based imputation methods have emerged as very competitive approaches.
arXiv Detail & Related papers (2024-01-17T21:28:00Z) - Minimally Supervised Learning using Topological Projections in
Self-Organizing Maps [55.31182147885694]
We introduce a semi-supervised learning approach based on topological projections in self-organizing maps (SOMs)
Our proposed method first trains SOMs on unlabeled data and then a minimal number of available labeled data points are assigned to key best matching units (BMU)
Our results indicate that the proposed minimally supervised model significantly outperforms traditional regression techniques.
arXiv Detail & Related papers (2024-01-12T22:51:48Z) - The effect of data augmentation and 3D-CNN depth on Alzheimer's Disease
detection [51.697248252191265]
This work summarizes and strictly observes best practices regarding data handling, experimental design, and model evaluation.
We focus on Alzheimer's Disease (AD) detection, which serves as a paradigmatic example of challenging problem in healthcare.
Within this framework, we train predictive 15 models, considering three different data augmentation strategies and five distinct 3D CNN architectures.
arXiv Detail & Related papers (2023-09-13T10:40:41Z) - DCID: Deep Canonical Information Decomposition [84.59396326810085]
We consider the problem of identifying the signal shared between two one-dimensional target variables.
We propose ICM, an evaluation metric which can be used in the presence of ground-truth labels.
We also propose Deep Canonical Information Decomposition (DCID) - a simple, yet effective approach for learning the shared variables.
arXiv Detail & Related papers (2023-06-27T16:59:06Z) - Compound Batch Normalization for Long-tailed Image Classification [77.42829178064807]
We propose a compound batch normalization method based on a Gaussian mixture.
It can model the feature space more comprehensively and reduce the dominance of head classes.
The proposed method outperforms existing methods on long-tailed image classification.
arXiv Detail & Related papers (2022-12-02T07:31:39Z) - Distributional Adaptive Soft Regression Trees [0.0]
This article proposes a new type of a distributional regression tree using a multivariate soft split rule.
One great advantage of the soft split is that smooth high-dimensional functions can be estimated with only one tree.
We show by means of extensive simulation studies that the algorithm has excellent properties and outperforms various benchmark methods.
arXiv Detail & Related papers (2022-10-19T08:59:02Z) - Few-Shot Non-Parametric Learning with Deep Latent Variable Model [50.746273235463754]
We propose Non-Parametric learning by Compression with Latent Variables (NPC-LV)
NPC-LV is a learning framework for any dataset with abundant unlabeled data but very few labeled ones.
We show that NPC-LV outperforms supervised methods on all three datasets on image classification in low data regime.
arXiv Detail & Related papers (2022-06-23T09:35:03Z) - On multivariate randomized classification trees: $l_0$-based sparsity,
VC~dimension and decomposition methods [0.9346127431927981]
We investigate the nonlinear continuous optimization formulation proposed in Blanquero et al.
We first consider alternative methods to sparsify such trees based on concave approximations of the $l_0$ norm"
We propose a general decomposition scheme and an efficient version of it. Experiments on larger datasets show that the proposed decomposition method is able to significantly reduce the training times without compromising the accuracy.
arXiv Detail & Related papers (2021-12-09T22:49:08Z) - A cautionary tale on fitting decision trees to data from additive
models: generalization lower bounds [9.546094657606178]
We study the generalization performance of decision trees with respect to different generative regression models.
This allows us to elicit their inductive bias, that is, the assumptions the algorithms make (or do not make) to generalize to new data.
We prove a sharp squared error generalization lower bound for a large class of decision tree algorithms fitted to sparse additive models.
arXiv Detail & Related papers (2021-10-18T21:22:40Z) - Gated recurrent units and temporal convolutional network for multilabel
classification [122.84638446560663]
This work proposes a new ensemble method for managing multilabel classification.
The core of the proposed approach combines a set of gated recurrent units and temporal convolutional neural networks trained with variants of the Adam gradients optimization approach.
arXiv Detail & Related papers (2021-10-09T00:00:16Z) - Attentional-Biased Stochastic Gradient Descent [74.49926199036481]
We present a provable method (named ABSGD) for addressing the data imbalance or label noise problem in deep learning.
Our method is a simple modification to momentum SGD where we assign an individual importance weight to each sample in the mini-batch.
ABSGD is flexible enough to combine with other robust losses without any additional cost.
arXiv Detail & Related papers (2020-12-13T03:41:52Z) - Handling missing data in model-based clustering [0.0]
We propose two methods to fit Gaussian mixtures in the presence of missing data.
Both methods use a variant of the Monte Carlo Expectation-Maximisation algorithm for data augmentation.
We show that the proposed methods outperform the multiple imputation approach, both in terms of clusters identification and density estimation.
arXiv Detail & Related papers (2020-06-04T15:36:31Z) - Carath\'eodory Sampling for Stochastic Gradient Descent [79.55586575988292]
We present an approach that is inspired by classical results of Tchakaloff and Carath'eodory about measure reduction.
We adaptively select the descent steps where the measure reduction is carried out.
We combine this with Block Coordinate Descent so that measure reduction can be done very cheaply.
arXiv Detail & Related papers (2020-06-02T17:52:59Z) - Adaptive Correlated Monte Carlo for Contextual Categorical Sequence
Generation [77.7420231319632]
We adapt contextual generation of categorical sequences to a policy gradient estimator, which evaluates a set of correlated Monte Carlo (MC) rollouts for variance control.
We also demonstrate the use of correlated MC rollouts for binary-tree softmax models, which reduce the high generation cost in large vocabulary scenarios.
arXiv Detail & Related papers (2019-12-31T03:01:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.