Hypothesis testing for matched pairs with missing data by maximum mean
discrepancy: An application to continuous glucose monitoring
- URL: http://arxiv.org/abs/2206.01590v1
- Date: Fri, 3 Jun 2022 14:20:11 GMT
- Title: Hypothesis testing for matched pairs with missing data by maximum mean
discrepancy: An application to continuous glucose monitoring
- Authors: Marcos Matabuena, Paulo F\'elix, Marc Ditzhaus, Juan Vidal and
Francisco Gude
- Abstract summary: This paper proposes new estimators of the maximum mean discrepancy (MMD) to handle complex matched pairs with missing data.
These estimators can detect differences in data distributions under different missingness mechanisms.
Data from continuous glucose monitoring in a longitudinal population-based diabetes study are used to illustrate the application of this approach.
- Score: 0.8399688944263843
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: A frequent problem in statistical science is how to properly handle missing
data in matched paired observations. There is a large body of literature coping
with the univariate case. Yet, the ongoing technological progress in measuring
biological systems raises the need for addressing more complex data, e.g.,
graphs, strings and probability distributions, among others. In order to fill
this gap, this paper proposes new estimators of the maximum mean discrepancy
(MMD) to handle complex matched pairs with missing data. These estimators can
detect differences in data distributions under different missingness
mechanisms. The validity of this approach is proven and further studied in an
extensive simulation study, and results of statistical consistency are
provided. Data from continuous glucose monitoring in a longitudinal
population-based diabetes study are used to illustrate the application of this
approach. By employing the new distributional representations together with
cluster analysis, new clinical criteria on how glucose changes vary at the
distributional level over five years can be explored.
Related papers
- Collaborative Heterogeneous Causal Inference Beyond Meta-analysis [68.4474531911361]
We propose a collaborative inverse propensity score estimator for causal inference with heterogeneous data.
Our method shows significant improvements over the methods based on meta-analysis when heterogeneity increases.
arXiv Detail & Related papers (2024-04-24T09:04:36Z) - Seeing Unseen: Discover Novel Biomedical Concepts via
Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues.
We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space.
A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z) - Estimating Unknown Population Sizes Using the Hypergeometric Distribution [1.03590082373586]
We tackle the challenge of estimating discrete distributions when both the total population size and the sizes of its constituent categories are unknown.
We develop our approach to account for a data generating process where the ground-truth is a mixture of distributions conditional on a continuous latent variable.
Empirical data simulation demonstrates that our method outperforms other likelihood functions used to model count data.
arXiv Detail & Related papers (2024-02-22T01:53:56Z) - Bayesian Federated Inference for regression models based on non-shared multicenter data sets from heterogeneous populations [0.0]
In a regression model, the sample size must be large enough relative to the number of possible predictors.
Pooling data from different data sets collected in different (medical) centers would alleviate this problem, but is often not feasible due to privacy regulation or logistic problems.
An alternative route would be to analyze the local data in the centers separately and combine the statistical inference results with the Bayesian Federated Inference (BFI) methodology.
The aim of this approach is to compute from the inference results in separate centers what would have been found if the statistical analysis was performed on the combined data.
arXiv Detail & Related papers (2024-02-05T11:10:27Z) - Interpretable Causal Inference for Analyzing Wearable, Sensor, and Distributional Data [62.56890808004615]
We develop an interpretable method for distributional data analysis that ensures trustworthy and robust decision-making.
We demonstrate ADD MALTS' utility by studying the effectiveness of continuous glucose monitors in mitigating diabetes risks.
arXiv Detail & Related papers (2023-12-17T00:42:42Z) - Class-Balancing Diffusion Models [57.38599989220613]
Class-Balancing Diffusion Models (CBDM) are trained with a distribution adjustment regularizer as a solution.
Our method benchmarked the generation results on CIFAR100/CIFAR100LT dataset and shows outstanding performance on the downstream recognition task.
arXiv Detail & Related papers (2023-04-30T20:00:14Z) - Gaussian Latent Dirichlet Allocation for Discrete Human State Discovery [1.057079240576682]
We propose and validate an unsupervised probabilistic model, Gaussian Latent Dirichlet Allocation (GLDA), for the problem of discrete state discovery.
GLDA borrows the individual-specific mixture structure from a popular topic model Latent Dirichlet Allocation (LDA) in Natural Language Processing.
We found that in both datasets the GLDA-learned class weights achieved significantly higher correlations with clinically assessed depression, anxiety, and stress scores than those produced by the baseline GMM.
arXiv Detail & Related papers (2022-06-28T18:33:46Z) - Improving Uncertainty Calibration via Prior Augmented Data [56.88185136509654]
Neural networks have proven successful at learning from complex data distributions by acting as universal function approximators.
They are often overconfident in their predictions, which leads to inaccurate and miscalibrated probabilistic predictions.
We propose a solution by seeking out regions of feature space where the model is unjustifiably overconfident, and conditionally raising the entropy of those predictions towards that of the prior distribution of the labels.
arXiv Detail & Related papers (2021-02-22T07:02:37Z) - Glucose values prediction five years ahead with a new framework of
missing responses in reproducing kernel Hilbert spaces, and the use of
continuous glucose monitoring technology [0.0]
AEGIS study possesses unique information on longitudinal changes in circulating glucose through continuous glucose monitoring technology (CGM)
As usual in longitudinal medical studies, there is a significant amount of missing data in the outcome variables.
This article proposes a new data analysis framework based on learning in reproducing kernel Hilbert spaces (RKHS) with missing responses.
arXiv Detail & Related papers (2020-12-11T18:51:44Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.