JigSaw: A tool for discovering explanatory high-order interactions from
random forests
- URL: http://arxiv.org/abs/2005.04342v1
- Date: Sat, 9 May 2020 01:53:45 GMT
- Title: JigSaw: A tool for discovering explanatory high-order interactions from
random forests
- Authors: Demetrius DiMucci
- Abstract summary: JigSaw was developed to aid in the discovery of patterns that could explain predictions made by the forest.
It was first used to identify patterns clinical measurements associated with heart disease.
It was then used to find patterns associated with breast cancer using metabolites measured in the blood.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning is revolutionizing biology by facilitating the prediction of
outcomes from complex patterns found in massive data sets. Large biological
data sets, like those generated by transcriptome or microbiome studies,measure
many relevant components that interact in vivo with one another in modular
ways.Identifying the high-order interactions that machine learning models use
to make predictions would facilitate the development of hypotheses linking
combinations of measured components to outcome. By using the structure of
random forests, a new algorithmic approach, termed JigSaw,was developed to aid
in the discovery of patterns that could explain predictions made by the forest.
By examining the patterns of individual decision trees JigSaw identifies
high-order interactions between measured features that are strongly associated
with a particular outcome and identifies the relevant decision thresholds.
JigSaw's effectiveness was tested in simulation studies where it was able to
recover multiple ground truth patterns;even in the presence of significant
noise. It was then used to find patterns associated with outcomes in two real
world data sets.It was first used to identify patterns clinical measurements
associated with heart disease. It was then used to find patterns associated
with breast cancer using metabolites measured in the blood. In heart disease,
JigSaw identified several three-way interactions that combine to explain most
of the heart disease records (66%) with high precision (93%). In breast cancer,
three two-way interactions were recovered that can be combined to explain
almost all records (92%) with good precision (79%). JigSaw is an efficient
method for exploring high-dimensional feature spaces for rules that explain
statistical associations with a given outcome and can inspire the generation of
testable hypotheses.
Related papers
- Interpreting artificial neural networks to detect genome-wide association signals for complex traits [0.0]
Investigating the genetic architecture of complex diseases is challenging due to the highly polygenic and interactive landscape of genetic and environmental factors.
We trained artificial neural networks for predicting complex traits using both simulated and real genotype/phenotype datasets.
arXiv Detail & Related papers (2024-07-26T15:20:42Z) - Deep Latent Variable Modeling of Physiological Signals [0.8702432681310401]
We explore high-dimensional problems related to physiological monitoring using latent variable models.
First, we present a novel deep state-space model to generate electrical waveforms of the heart using optically obtained signals as inputs.
Second, we present a brain signal modeling scheme that combines the strengths of probabilistic graphical models and deep adversarial learning.
Third, we propose a framework for the joint modeling of physiological measures and behavior.
arXiv Detail & Related papers (2024-05-29T17:07:33Z) - Seeing Unseen: Discover Novel Biomedical Concepts via
Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues.
We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space.
A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z) - Understanding Breast Cancer Survival: Using Causality and Language
Models on Multi-omics Data [23.850817918011863]
We exploit causal discovery algorithms to investigate how perturbations in the genome can affect the survival of patients diagnosed with breast cancer.
Our findings reveal important factors related to the vital status of patients using causal discovery algorithms.
Results are validated through language models trained on biomedical literature.
arXiv Detail & Related papers (2023-05-28T17:07:46Z) - Drug Synergistic Combinations Predictions via Large-Scale Pre-Training
and Graph Structure Learning [82.93806087715507]
Drug combination therapy is a well-established strategy for disease treatment with better effectiveness and less safety degradation.
Deep learning models have emerged as an efficient way to discover synergistic combinations.
Our framework achieves state-of-the-art results in comparison with other deep learning-based methods.
arXiv Detail & Related papers (2023-01-14T15:07:43Z) - Unsupervised ensemble-based phenotyping helps enhance the
discoverability of genes related to heart morphology [57.25098075813054]
We propose a new framework for gene discovery entitled Un Phenotype Ensembles.
It builds a redundant yet highly expressive representation by pooling a set of phenotypes learned in an unsupervised manner.
These phenotypes are then analyzed via (GWAS), retaining only highly confident and stable associations.
arXiv Detail & Related papers (2023-01-07T18:36:44Z) - Federated Learning Enables Big Data for Rare Cancer Boundary Detection [98.5549882883963]
We present findings from the largest Federated ML study to-date, involving data from 71 healthcare institutions across 6 continents.
We generate an automatic tumor boundary detector for the rare disease of glioblastoma.
We demonstrate a 33% improvement over a publicly trained model to delineate the surgically targetable tumor, and 23% improvement over the tumor's entire extent.
arXiv Detail & Related papers (2022-04-22T17:27:00Z) - Robust Hierarchical Patterns for identifying MDD patients: A Multisite
Study [3.4561220135252264]
We look at hierarchical Sparse Connectivity Patterns (h SCPs) as biomarkers for major depressive disorder (MDD)
We propose a novel model based on h SCPs to predict MDD patients from functional connectivity matrices extracted from resting-state fMRI data.
Our results show the impact of diversity on prediction performance. Our model can reduce diversity and improve the predictive and generalizing capability of the components.
arXiv Detail & Related papers (2022-02-22T19:40:32Z) - Deep neural networks approach to microbial colony detection -- a
comparative analysis [52.77024349608834]
This study investigates the performance of three deep learning approaches for object detection on the AGAR dataset.
The achieved results may serve as a benchmark for future experiments.
arXiv Detail & Related papers (2021-08-23T12:06:00Z) - Risk factor identification for incident heart failure using neural
network distillation and variable selection [24.366241122862473]
We propose two methods to untangle hidden patterns learned by an established deep learning model for risk association identification.
A cohort with 788,880 (8.3% incident heart failure) patients was considered for the study.
Model distillation identified 598 and 379 diseases that were associated and dissociated with heart failure at the population level, respectively.
In addition to these important population-level insights, we developed an approach to individual-level interpretation to take account of varying manifestation of heart failure in clinical practice.
arXiv Detail & Related papers (2021-02-17T10:20:38Z) - Modeling Shared Responses in Neuroimaging Studies through MultiView ICA [94.31804763196116]
Group studies involving large cohorts of subjects are important to draw general conclusions about brain functional organization.
We propose a novel MultiView Independent Component Analysis model for group studies, where data from each subject are modeled as a linear combination of shared independent sources plus noise.
We demonstrate the usefulness of our approach first on fMRI data, where our model demonstrates improved sensitivity in identifying common sources among subjects.
arXiv Detail & Related papers (2020-06-11T17:29:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.