Improving generalization of machine learning-identified biomarkers with
causal modeling: an investigation into immune receptor diagnostics
- URL: http://arxiv.org/abs/2204.09291v2
- Date: Mon, 3 Apr 2023 09:03:07 GMT
- Title: Improving generalization of machine learning-identified biomarkers with
causal modeling: an investigation into immune receptor diagnostics
- Authors: Milena Pavlovi\'c, Ghadi S. Al Hajj, Chakravarthi Kanduri, Johan
Pensar, Mollie Wood, Ludvig M. Sollid, Victor Greiff, Geir Kjetil Sandve
- Abstract summary: We focus on a specific, recently established high-dimensional biomarker - adaptive immune receptor repertoires (AIRRs)
We argue that causal modeling improves machine learning-based biomarker robustness by identifying stable relations between variables.
- Score: 2.40246230430283
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Machine learning is increasingly used to discover diagnostic and prognostic
biomarkers from high-dimensional molecular data. However, a variety of factors
related to experimental design may affect the ability to learn generalizable
and clinically applicable diagnostics. Here, we argue that a causal perspective
improves the identification of these challenges and formalizes their relation
to the robustness and generalization of machine learning-based diagnostics. To
make for a concrete discussion, we focus on a specific, recently established
high-dimensional biomarker - adaptive immune receptor repertoires (AIRRs).
Through simulations, we illustrate how major biological and experimental
factors of the AIRR domain may influence the learned biomarkers. In conclusion,
we argue that causal modeling improves machine learning-based biomarker
robustness by identifying stable relations between variables and by guiding the
adjustment of the relations and variables that vary between populations.
Related papers
- Causal Representation Learning from Multimodal Biological Observations [57.00712157758845]
We aim to develop flexible identification conditions for multimodal data.
We establish identifiability guarantees for each latent component, extending the subspace identification results from prior work.
Our key theoretical ingredient is the structural sparsity of the causal connections among distinct modalities.
arXiv Detail & Related papers (2024-11-10T16:40:27Z) - Improving Model's Interpretability and Reliability using Biomarkers [0.04705265502876046]
The objective of this study is to assess whether explanations from a decision tree classifier, utilizing biomarkers, can improve users' ability to identify inaccurate model predictions.
Our findings demonstrate that decision tree explanations, based on clinically established biomarkers, can assist clinicians in detecting false positives, thus improving the reliability of diagnostic models in medicine.
arXiv Detail & Related papers (2024-02-16T20:19:28Z) - Knowledge-Informed Machine Learning for Cancer Diagnosis and Prognosis:
A review [2.2268038840298714]
We review the state-of-the-art machine learning studies that adopted the fusion of biomedical knowledge and data.
We provide an overview of diverse forms of knowledge representation and current strategies of knowledge integration into machine learning pipelines.
arXiv Detail & Related papers (2024-01-12T07:01:36Z) - Causal machine learning for single-cell genomics [94.28105176231739]
We discuss the application of machine learning techniques to single-cell genomics and their challenges.
We first present the model that underlies most of current causal approaches to single-cell biology.
We then identify open problems in the application of causal approaches to single-cell data.
arXiv Detail & Related papers (2023-10-23T13:35:24Z) - Simulation-based Inference for Cardiovascular Models [57.92535897767929]
We use simulation-based inference to solve the inverse problem of mapping waveforms back to plausible physiological parameters.
We perform an in-silico uncertainty analysis of five biomarkers of clinical interest.
We study the gap between in-vivo and in-silico with the MIMIC-III waveform database.
arXiv Detail & Related papers (2023-07-26T02:34:57Z) - A Causal Framework for Decomposing Spurious Variations [68.12191782657437]
We develop tools for decomposing spurious variations in Markovian and Semi-Markovian models.
We prove the first results that allow a non-parametric decomposition of spurious effects.
The described approach has several applications, ranging from explainable and fair AI to questions in epidemiology and medicine.
arXiv Detail & Related papers (2023-06-08T09:40:28Z) - Functional Integrative Bayesian Analysis of High-dimensional
Multiplatform Genomic Data [0.8029049649310213]
We propose a framework called Functional Integrative Bayesian Analysis of High-dimensional Multiplatform Genomic Data (fiBAG)
fiBAG allows simultaneous identification of upstream functional evidence of proteogenomic biomarkers.
We demonstrate the profitability of fiBAG via a pan-cancer analysis of 14 cancer types.
arXiv Detail & Related papers (2022-12-29T03:31:45Z) - Differentiable Agent-based Epidemiology [71.81552021144589]
We introduce GradABM: a scalable, differentiable design for agent-based modeling that is amenable to gradient-based learning with automatic differentiation.
GradABM can quickly simulate million-size populations in few seconds on commodity hardware, integrate with deep neural networks and ingest heterogeneous data sources.
arXiv Detail & Related papers (2022-07-20T07:32:02Z) - COVID-Net Biochem: An Explainability-driven Framework to Building
Machine Learning Models for Predicting Survival and Kidney Injury of COVID-19
Patients from Clinical and Biochemistry Data [66.43957431843324]
We introduce COVID-Net Biochem, a versatile and explainable framework for constructing machine learning models.
We apply this framework to predict COVID-19 patient survival and the likelihood of developing Acute Kidney Injury during hospitalization.
arXiv Detail & Related papers (2022-04-24T07:38:37Z) - Adversarial Factor Models for the Generation of Improved Autism
Diagnostic Biomarkers [19.48133927082379]
We present applications of adversarial linear factor models in the creation of improved biomarkers for autism spectrum disorder (ASD) diagnosis.
First, we demonstrate that an adversarial linear factor model can be used to remove confounding information from our biomarkers, ensuring that they contain only pertinent information on ASD.
Second, we show this same model can be used to learn a disentangled representation of multimodal biomarkers that results in an increase in predictive performance.
arXiv Detail & Related papers (2021-09-24T21:56:30Z) - Interpretable multimodal fusion networks reveal mechanisms of brain
cognition [26.954460880062506]
We develop an interpretable multimodal fusion model, gCAM-CCL, which can perform automated diagnosis and result interpretation simultaneously.
We validate the gCAM-CCL model on a brain imaging-genetic study, and show gCAM-CCL's performed well for both classification and mechanism analysis.
arXiv Detail & Related papers (2020-06-16T18:52:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.