Model-based recursive partitioning for discrete event times
- URL: http://arxiv.org/abs/2209.06592v1
- Date: Wed, 14 Sep 2022 12:17:56 GMT
- Title: Model-based recursive partitioning for discrete event times
- Authors: Cynthia Huber, Matthias Schmid, Tim Friede
- Abstract summary: We propose MOB for discrete Survival data (MOB-dS) which controls the type I error rate of the test used for data splitting.
We find that the type I error rates of the test is well controlled for MOB-dS, but observe some considerable inflations of the error rate for MOB.
- Score: 3.222802562733787
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model-based recursive partitioning (MOB) is a semi-parametric statistical
approach allowing the identification of subgroups that can be combined with a
broad range of outcome measures including continuous time-to-event outcomes.
When time is measured on a discrete scale, methods and models need to account
for this discreetness as otherwise subgroups might be spurious and effects
biased. The test underlying the splitting criterion of MOB, the M-fluctuation
test, assumes independent observations. However, for fitting discrete
time-to-event models the data matrix has to be modified resulting in an
augmented data matrix violating the independence assumption. We propose MOB for
discrete Survival data (MOB-dS) which controls the type I error rate of the
test used for data splitting and therefore the rate of identifying subgroups
although none is present. MOB-ds uses a permutation approach accounting for
dependencies in the augmented time-to-event data to obtain the distribution
under the null hypothesis of no subgroups being present. Through simulations we
investigate the type I error rate of the new MOB-dS and the standard MOB for
different patterns of survival curves and event rates. We find that the type I
error rates of the test is well controlled for MOB-dS, but observe some
considerable inflations of the error rate for MOB. To illustrate the proposed
methods, MOB-dS is applied to data on unemployment duration.
Related papers
- MMM: Clustering Multivariate Longitudinal Mixed-type Data [0.2578242050187029]
We introduce the Mixture of Mixed-Matrices (MMM) model.<n>The model is able to handle continuous, ordinal, binary, nominal and count data.<n>A real-world application on financial data is presented.
arXiv Detail & Related papers (2025-09-15T17:30:31Z) - MissDDIM: Deterministic and Efficient Conditional Diffusion for Tabular Data Imputation [2.124791625488617]
We present MissDDIM, a conditional diffusion framework that adapts Denoising Diffusion Implicit Models (DDIM) for tabular imputation.<n>While sampling enables diverse completions, it also introduces output variability that complicates downstream processing.
arXiv Detail & Related papers (2025-08-05T04:55:26Z) - A Sample Efficient Conditional Independence Test in the Presence of Discretization [54.047334792855345]
Conditional Independence (CI) tests directly to discretized data can lead to incorrect conclusions.<n>Recent advancements have sought to infer the correct CI relationship between the latent variables through binarizing observed data.<n>Motivated by this, this paper introduces a sample-efficient CI test that does not rely on the binarization process.
arXiv Detail & Related papers (2025-06-10T12:41:26Z) - Data-driven Bayesian State Estimation with Compressed Measurement of Model-free Process using Semi-supervised Learning [57.04370580292727]
The research topic is: data-driven Bayesian state estimation with compressed measurement (BSCM) of model-free process.
The dimension of the temporal measurement vector is lower than the dimension of the temporal state vector to be estimated.
Two existing unsupervised learning-based data-driven methods fail to address the BSCM problem for model-free process.
We develop a semi-supervised learning-based DANSE method, referred to as SemiDANSE.
arXiv Detail & Related papers (2024-07-10T05:03:48Z) - Bayesian temporal biclustering with applications to multi-subject neuroscience studies [6.515516311120015]
We propose a Bayesian model for temporal biclustering featuring nested partitions, where a time-invariant partition of subjects induces a time-varying partition of measurements.
Our approach allows for data-driven determination of the number of subject and measurement clusters as well as estimation of the number and location of changepoints in measurement partitions.
arXiv Detail & Related papers (2024-06-24T20:41:37Z) - Your Absorbing Discrete Diffusion Secretly Models the Conditional Distributions of Clean Data [55.54827581105283]
We show that the concrete score in absorbing diffusion can be expressed as conditional probabilities of clean data.
We propose a dedicated diffusion model without time-condition that characterizes the time-independent conditional probabilities.
Our models achieve SOTA performance among diffusion models on 5 zero-shot language modeling benchmarks.
arXiv Detail & Related papers (2024-06-06T04:22:11Z) - On Calibrating Diffusion Probabilistic Models [78.75538484265292]
diffusion probabilistic models (DPMs) have achieved promising results in diverse generative tasks.
We propose a simple way for calibrating an arbitrary pretrained DPM, with which the score matching loss can be reduced and the lower bounds of model likelihood can be increased.
Our calibration method is performed only once and the resulting models can be used repeatedly for sampling.
arXiv Detail & Related papers (2023-02-21T14:14:40Z) - Learning from aggregated data with a maximum entropy model [73.63512438583375]
We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis.
We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
arXiv Detail & Related papers (2022-10-05T09:17:27Z) - Robust Continual Test-time Adaptation: Instance-aware BN and
Prediction-balanced Memory [58.72445309519892]
We present a new test-time adaptation scheme that is robust against non-i.i.d. test data streams.
Our novelty is mainly two-fold: (a) Instance-Aware Batch Normalization (IABN) that corrects normalization for out-of-distribution samples, and (b) Prediction-balanced Reservoir Sampling (PBRS) that simulates i.i.d. data stream from non-i.i.d. stream in a class-balanced manner.
arXiv Detail & Related papers (2022-08-10T03:05:46Z) - Sharing pattern submodels for prediction with missing values [12.981974894538668]
Missing values are unavoidable in many applications of machine learning and present challenges both during training and at test time.
We propose an alternative approach, called sharing pattern submodels, which i) makes predictions robust to missing values at test time, ii) maintains or improves the predictive power of pattern submodels andiii) has a short description, enabling improved interpretability.
arXiv Detail & Related papers (2022-06-22T15:09:40Z) - Sampling in Dirichlet Process Mixture Models for Clustering Streaming
Data [5.660207256468972]
Dirichlet Process Mixture Model (DPMM) seems a natural choice for the streaming-data case.
Existing methods for online DPMM inference are too slow to handle rapid data streams.
We propose adapting both the DPMM and a known DPMM sampling-based non-streaming inference method for streaming-data clustering.
arXiv Detail & Related papers (2022-02-27T08:51:50Z) - Model-based Clustering with Missing Not At Random Data [0.8777702580252754]
We propose model-based clustering algorithms designed to handle very general types of missing data, including MNAR data.
Several MNAR models are discussed, for which the cause of the missingness can depend on both the values of the missing variable themselves and on the class membership.
We focus on a specific MNAR model, called MNARz, for which the missingness only depends on the class membership.
arXiv Detail & Related papers (2021-12-20T09:52:12Z) - Robust Finite Mixture Regression for Heterogeneous Targets [70.19798470463378]
We propose an FMR model that finds sample clusters and jointly models multiple incomplete mixed-type targets simultaneously.
We provide non-asymptotic oracle performance bounds for our model under a high-dimensional learning framework.
The results show that our model can achieve state-of-the-art performance.
arXiv Detail & Related papers (2020-10-12T03:27:07Z) - The UU-test for Statistical Modeling of Unimodal Data [0.20305676256390928]
We propose a technique called UU-test (Unimodal Uniform test) to decide on the unimodality of a one-dimensional dataset.
A unique feature of this approach is that in the case of unimodality, it also provides a statistical model of the data in the form of a Uniform Mixture Model.
arXiv Detail & Related papers (2020-08-28T08:34:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.