Studying Limits of Explainability by Integrated Gradients for Gene
Expression Models
- URL: http://arxiv.org/abs/2303.11336v1
- Date: Sun, 19 Mar 2023 19:54:15 GMT
- Title: Studying Limits of Explainability by Integrated Gradients for Gene
Expression Models
- Authors: Myriam Bontonou, Ana\"is Haget, Maria Boulougouri, Jean-Michel Arbona,
Benjamin Audit, Pierre Borgnat
- Abstract summary: We show that ranking features by importance is not enough to robustly identify biomarkers.
As it is difficult to evaluate whether biomarkers reflect relevant causes without known ground truth, we simulate gene expression data by proposing a hierarchical model.
- Score: 3.220287168504093
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Understanding the molecular processes that drive cellular life is a
fundamental question in biological research. Ambitious programs have gathered a
number of molecular datasets on large populations. To decipher the complex
cellular interactions, recent work has turned to supervised machine learning
methods. The scientific questions are formulated as classical learning problems
on tabular data or on graphs, e.g. phenotype prediction from gene expression
data. In these works, the input features on which the individual predictions
are predominantly based are often interpreted as indicative of the cause of the
phenotype, such as cancer identification. Here, we propose to explore the
relevance of the biomarkers identified by Integrated Gradients, an
explainability method for feature attribution in machine learning. Through a
motivating example on The Cancer Genome Atlas, we show that ranking features by
importance is not enough to robustly identify biomarkers. As it is difficult to
evaluate whether biomarkers reflect relevant causes without known ground truth,
we simulate gene expression data by proposing a hierarchical model based on
Latent Dirichlet Allocation models. We also highlight good practices for
evaluating explanations for genomics data and propose a direction to derive
more insights from these explanations.
Related papers
- Seeing Unseen: Discover Novel Biomedical Concepts via
Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues.
We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space.
A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z) - A Comparative Analysis of Gene Expression Profiling by Statistical and
Machine Learning Approaches [1.8954222800767324]
We discuss the biological and the methodological limitations of machine learning models to classify cancer samples.
Gene rankings are obtained from explainability methods adapted to these models.
We observe that the information learned by black-box neural networks is related to the notion of differential expression.
arXiv Detail & Related papers (2024-02-01T18:17:36Z) - Single-Cell Deep Clustering Method Assisted by Exogenous Gene
Information: A Novel Approach to Identifying Cell Types [50.55583697209676]
We develop an attention-enhanced graph autoencoder, which is designed to efficiently capture the topological features between cells.
During the clustering process, we integrated both sets of information and reconstructed the features of both cells and genes to generate a discriminative representation.
This research offers enhanced insights into the characteristics and distribution of cells, thereby laying the groundwork for early diagnosis and treatment of diseases.
arXiv Detail & Related papers (2023-11-28T09:14:55Z) - Causal machine learning for single-cell genomics [94.28105176231739]
We discuss the application of machine learning techniques to single-cell genomics and their challenges.
We first present the model that underlies most of current causal approaches to single-cell biology.
We then identify open problems in the application of causal approaches to single-cell data.
arXiv Detail & Related papers (2023-10-23T13:35:24Z) - Tertiary Lymphoid Structures Generation through Graph-based Diffusion [54.37503714313661]
In this work, we leverage state-of-the-art graph-based diffusion models to generate biologically meaningful cell-graphs.
We show that the adopted graph diffusion model is able to accurately learn the distribution of cells in terms of their tertiary lymphoid structures (TLS) content.
arXiv Detail & Related papers (2023-10-10T14:37:17Z) - Graph Representation Learning for Interactive Biomolecule Systems [2.786956882821218]
This paper presents a review of the methodologies used to represent biological molecules and systems as computer-recognizable objects.
It examines how geometric deep learning models, with an emphasis on graph-based techniques, can analyze biomolecule data to enable drug discovery, protein characterization, and biological system analysis.
arXiv Detail & Related papers (2023-04-05T08:00:50Z) - Machine Learning Methods for Cancer Classification Using Gene Expression
Data: A Review [77.34726150561087]
Cancer is the second major cause of death after cardiovascular diseases.
Gene expression can play a fundamental role in the early detection of cancer.
This study reviews recent progress in gene expression analysis for cancer classification using machine learning methods.
arXiv Detail & Related papers (2023-01-28T15:03:03Z) - Self-Supervised Graph Representation Learning for Neuronal Morphologies [75.38832711445421]
We present GraphDINO, a data-driven approach to learn low-dimensional representations of 3D neuronal morphologies from unlabeled datasets.
We show, in two different species and across multiple brain areas, that this method yields morphological cell type clusterings on par with manual feature-based classification by experts.
Our method could potentially enable data-driven discovery of novel morphological features and cell types in large-scale datasets.
arXiv Detail & Related papers (2021-12-23T12:17:47Z) - Implications of Topological Imbalance for Representation Learning on
Biomedical Knowledge Graphs [16.566710222582618]
We show how knowledge graph embedding models can be affected by structural imbalance.
We show how the graph topology can be perturbed to artificially alter the rank of a gene via random, biologically meaningless information.
arXiv Detail & Related papers (2021-12-13T11:20:36Z) - Data-Driven Logistic Regression Ensembles With Applications in Genomics [0.0]
We propose a new approach for dealing with high-dimensional binary classification problems that combines ideas from regularization and ensembling.
We demonstrate the good performance of our method in terms of prediction accuracy and identification of key biomarkers using several medical datasets involving common diseases such as cancer, multiple sclerosis and psoriasis.
arXiv Detail & Related papers (2021-02-17T05:57:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.