Nonparametric independence tests in high-dimensional settings, with applications to the genetics of complex disease
- URL: http://arxiv.org/abs/2407.19624v1
- Date: Mon, 29 Jul 2024 01:00:53 GMT
- Title: Nonparametric independence tests in high-dimensional settings, with applications to the genetics of complex disease
- Authors: Fernando Castro-Prado,
- Abstract summary: We show how defining adequate premetric structures on the support spaces of the genetic data allows for novel approaches to such testing.
For each problem, we provide mathematical results, simulations and the application to real data.
- Score: 55.2480439325792
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: [PhD thesis of FCP.] Nowadays, genetics studies large amounts of very diverse variables. Mathematical statistics has evolved in parallel to its applications, with much recent interest high-dimensional settings. In the genetics of human common disease, a number of relevant problems can be formulated as tests of independence. We show how defining adequate premetric structures on the support spaces of the genetic data allows for novel approaches to such testing. This yields a solid theoretical framework, which reflects the underlying biology, and allows for computationally-efficient implementations. For each problem, we provide mathematical results, simulations and the application to real data.
Related papers
- Robust Multi-view Co-expression Network Inference [8.697303234009528]
Inferring gene co-expression networks from transcriptome data presents many challenges.
We introduce a robust method for high-dimensional graph inference from multiple independent studies.
arXiv Detail & Related papers (2024-09-30T06:30:09Z) - Seeing Unseen: Discover Novel Biomedical Concepts via
Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues.
We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space.
A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z) - Precise Error Rates for Computationally Efficient Testing [75.63895690909241]
We revisit the question of simple-versus-simple hypothesis testing with an eye towards computational complexity.
An existing test based on linear spectral statistics achieves the best possible tradeoff curve between type I and type II error rates.
arXiv Detail & Related papers (2023-11-01T04:41:16Z) - PhyloGFN: Phylogenetic inference with generative flow networks [57.104166650526416]
We introduce the framework of generative flow networks (GFlowNets) to tackle two core problems in phylogenetics: parsimony-based and phylogenetic inference.
Because GFlowNets are well-suited for sampling complex structures, they are a natural choice for exploring and sampling from the multimodal posterior distribution over tree topologies.
We demonstrate that our amortized posterior sampler, PhyloGFN, produces diverse and high-quality evolutionary hypotheses on real benchmark datasets.
arXiv Detail & Related papers (2023-10-12T23:46:08Z) - Fast and Functional Structured Data Generators Rooted in
Out-of-Equilibrium Physics [62.997667081978825]
We address the challenge of using energy-based models to produce high-quality, label-specific data in structured datasets.
Traditional training methods encounter difficulties due to inefficient Markov chain Monte Carlo mixing.
We use a novel training algorithm that exploits non-equilibrium effects.
arXiv Detail & Related papers (2023-07-13T15:08:44Z) - Mixed Effects Neural ODE: A Variational Approximation for Analyzing the
Dynamics of Panel Data [50.23363975709122]
We propose a probabilistic model called ME-NODE to incorporate (fixed + random) mixed effects for analyzing panel data.
We show that our model can be derived using smooth approximations of SDEs provided by the Wong-Zakai theorem.
We then derive Evidence Based Lower Bounds for ME-NODE, and develop (efficient) training algorithms.
arXiv Detail & Related papers (2022-02-18T22:41:51Z) - Composite Goodness-of-fit Tests with Kernels [19.744607024807188]
We propose a kernel-based hypothesis tests for the challenging composite testing problem.
Our tests make use of minimum distance estimators based on the maximum mean discrepancy and the kernel Stein discrepancy.
As our main result, we show that we are able to estimate the parameter and conduct our test on the same data, while maintaining a correct test level.
arXiv Detail & Related papers (2021-11-19T15:25:06Z) - Testing Directed Acyclic Graph via Structural, Supervised and Generative
Adversarial Learning [7.623002328386318]
We propose a new hypothesis testing method for directed acyclic graph (DAG)
We build the test based on some highly flexible neural networks learners.
We demonstrate the efficacy of the test through simulations and a brain connectivity network analysis.
arXiv Detail & Related papers (2021-06-02T21:18:59Z) - High Dimensional Data Enrichment: Interpretable, Fast, and
Data-Efficient [38.40316295019222]
We introduce an estimator for the problem of multiple connected linear regressions known as Data Enrichment/Sharing.
We show that the recovery of the common parameter benefits from emphall of the pooled samples.
Overall, we present a first thorough statistical and computational analysis of inference in the data-sharing model.
arXiv Detail & Related papers (2018-06-11T15:15:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.