A U-Statistic-based random forest approach for genetic interaction study
- URL: http://arxiv.org/abs/2508.14924v1
- Date: Tue, 19 Aug 2025 06:22:20 GMT
- Title: A U-Statistic-based random forest approach for genetic interaction study
- Authors: Ming Li, Ruo-Sin Peng, Changshuai Wei, Qing Lu,
- Abstract summary: We propose a U-Statistic-based random forest approach, referred to as Forest U-Test, for genetic association studies with quantitative traits.<n>Through simulation studies, we showed that the Forest U-Test outperformed existing methods.
- Score: 5.418369213731247
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Variations in complex traits are influenced by multiple genetic variants, environmental risk factors, and their interactions. Though substantial progress has been made in identifying single genetic variants associated with complex traits, detecting the gene-gene and gene-environment interactions remains a great challenge. When a large number of genetic variants and environmental risk factors are involved, searching for interactions is limited to pair-wise interactions due to the exponentially increased feature space and computational intensity. Alternatively, recursive partitioning approaches, such as random forests, have gained popularity in high-dimensional genetic association studies. In this article, we propose a U-Statistic-based random forest approach, referred to as Forest U-Test, for genetic association studies with quantitative traits. Through simulation studies, we showed that the Forest U-Test outperformed existing methods. The proposed method was also applied to study Cannabis Dependence CD, using three independent datasets from the Study of Addiction: Genetics and Environment. A significant joint association was detected with an empirical p-value less than 0.001. The finding was also replicated in two independent datasets with p-values of 5.93e-19 and 4.70e-17, respectively.
Related papers
- Functional Analysis of Variance for Association Studies [0.624151172311885]
We propose a functional analysis of variance (FANOVA) method for testing an association of sequence variants in a genomic region with a qualitative trait.<n>FANOVA has a number of advantages: (1) it tests for a joint effect of gene variants, including both common and rare; (2) it fully utilizes linkage disequilibrium and genetic position information; and (3) it allows for either protective or risk-increasing causal variants.
arXiv Detail & Related papers (2025-08-14T21:02:45Z) - Weighted Diversified Sampling for Efficient Data-Driven Single-Cell Gene-Gene Interaction Discovery [56.622854875204645]
We present an innovative approach utilizing data-driven computational tools, leveraging an advanced Transformer model, to unearth gene-gene interactions.
A novel weighted diversified sampling algorithm computes the diversity score of each data sample in just two passes of the dataset.
arXiv Detail & Related papers (2024-10-21T03:35:23Z) - Nonparametric independence tests in high-dimensional settings, with applications to the genetics of complex disease [55.2480439325792]
We show how defining adequate premetric structures on the support spaces of the genetic data allows for novel approaches to such testing.
For each problem, we provide mathematical results, simulations and the application to real data.
arXiv Detail & Related papers (2024-07-29T01:00:53Z) - Interpreting artificial neural networks to detect genome-wide association signals for complex traits [0.0]
We trained artificial neural networks to predict complex traits using both simulated and real genotype-phenotype datasets.<n>We detected multiple loci associated with schizophrenia.
arXiv Detail & Related papers (2024-07-26T15:20:42Z) - Seeing Unseen: Discover Novel Biomedical Concepts via
Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues.
We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space.
A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z) - PhyloGFN: Phylogenetic inference with generative flow networks [57.104166650526416]
We introduce the framework of generative flow networks (GFlowNets) to tackle two core problems in phylogenetics: parsimony-based and phylogenetic inference.
Because GFlowNets are well-suited for sampling complex structures, they are a natural choice for exploring and sampling from the multimodal posterior distribution over tree topologies.
We demonstrate that our amortized posterior sampler, PhyloGFN, produces diverse and high-quality evolutionary hypotheses on real benchmark datasets.
arXiv Detail & Related papers (2023-10-12T23:46:08Z) - Genetic heterogeneity analysis using genetic algorithm and network
science [2.6166087473624318]
Genome-wide association studies (GWAS) can identify disease susceptible genetic variables.
Genetic variables intertwined with genetic effects often exhibit lower effect-size.
This paper introduces a novel feature selection mechanism for GWAS, named Feature Co-selection Network (FCSNet)
arXiv Detail & Related papers (2023-08-12T01:28:26Z) - Unsupervised ensemble-based phenotyping helps enhance the
discoverability of genes related to heart morphology [57.25098075813054]
We propose a new framework for gene discovery entitled Un Phenotype Ensembles.
It builds a redundant yet highly expressive representation by pooling a set of phenotypes learned in an unsupervised manner.
These phenotypes are then analyzed via (GWAS), retaining only highly confident and stable associations.
arXiv Detail & Related papers (2023-01-07T18:36:44Z) - rfPhen2Gen: A machine learning based association study of brain imaging
phenotypes to genotypes [71.1144397510333]
We learned machine learning models to predict SNPs using 56 brain imaging QTs.
SNPs within the known Alzheimer disease (AD) risk gene APOE had lowest RMSE for lasso and random forest.
Random forests identified additional SNPs that were not prioritized by the linear models but are known to be associated with brain-related disorders.
arXiv Detail & Related papers (2022-03-31T20:15:22Z) - High-dimensional multi-trait GWAS by reverse prediction of genotypes [3.441021278275805]
Reverse regression is a promising approach to perform multi-trait GWAS in high-dimensional settings.
We analyzed different machine learning methods for reverse regression in multi-trait GWAS.
Model feature coefficients correlated with the strength of association between variants and individual traits, and were predictive of true trans-eQTL target genes.
arXiv Detail & Related papers (2021-10-29T22:34:35Z) - Expectile Neural Networks for Genetic Data Analysis of Complex Diseases [3.0088453915399747]
We develop an expectile neural network (ENN) method for genetic data analyses of complex diseases.
Similar to expectile regression, ENN provides a comprehensive view of relationships between genetic variants and disease phenotypes.
We show that the proposed method outperformed an existing expectile regression when there exist complex relationships between genetic variants and disease phenotypes.
arXiv Detail & Related papers (2020-10-26T21:07:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.