ExPath: Towards Explaining Targeted Pathways for Biological Knowledge Bases
- URL: http://arxiv.org/abs/2502.18026v1
- Date: Tue, 25 Feb 2025 09:33:15 GMT
- Title: ExPath: Towards Explaining Targeted Pathways for Biological Knowledge Bases
- Authors: Rikuto Kotoge, Ziwei Yang, Zheng Chen, Yushun Dong, Yasuko Matsubara, Jimeng Sun, Yasushi Sakurai,
- Abstract summary: We propose a novel pathway inference framework, ExPath, to classify various graphs (bio-networks) in biological databases.<n>ExPath comprises three components: (1) a large protein language model (pLM) that encodes and embeds AA-seqs into graph, overcoming traditional obstacles in processing AA-seq data; (2) PathMamba, a hybrid architecture combining graph neural networks (GNNs) with state-space sequence modeling (Mamba) to capture both local interactions and global pathway-level dependencies; and (3) PathExplainer, a subgraph learning module that identifies functionally critical nodes and edges through train
- Score: 36.89299758497499
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Biological knowledge bases provide systemically functional pathways of cells or organisms in terms of molecular interaction. However, recognizing more targeted pathways, particularly when incorporating wet-lab experimental data, remains challenging and typically requires downstream biological analyses and expertise. In this paper, we frame this challenge as a solvable graph learning and explaining task and propose a novel pathway inference framework, ExPath, that explicitly integrates experimental data, specifically amino acid sequences (AA-seqs), to classify various graphs (bio-networks) in biological databases. The links (representing pathways) that contribute more to classification can be considered as targeted pathways. Technically, ExPath comprises three components: (1) a large protein language model (pLM) that encodes and embeds AA-seqs into graph, overcoming traditional obstacles in processing AA-seq data, such as BLAST; (2) PathMamba, a hybrid architecture combining graph neural networks (GNNs) with state-space sequence modeling (Mamba) to capture both local interactions and global pathway-level dependencies; and (3) PathExplainer, a subgraph learning module that identifies functionally critical nodes and edges through trainable pathway masks. We also propose ML-oriented biological evaluations and a new metric. The experiments involving 301 bio-networks evaluations demonstrate that pathways inferred by ExPath maintain biological meaningfulness. We will publicly release curated 301 bio-network data soon.
Related papers
- BioGraphFusion: Graph Knowledge Embedding for Biological Completion and Reasoning [0.8983181722105922]
We introduce BioGraphFusion, a novel framework for deeply synergistic semantic and structural learning.<n> Experiments across three key biomedical tasks demonstrate BioGraphFusion's superior performance over state-of-the-art KE, GNN, and ensemble models.
arXiv Detail & Related papers (2025-07-19T04:03:42Z) - Sparsity is All You Need: Rethinking Biological Pathway-Informed Approaches in Deep Learning [12.24146000012622]
Biologically-informed neural networks typically leverage pathway annotations to enhance performance in biomedical applications.<n>We conducted a comprehensive analysis of all relevant pathway-based neural network models for predictive tasks.<n>Our findings suggest that pathway annotations may be too noisy or inadequately explored by current methods.
arXiv Detail & Related papers (2025-05-07T10:14:31Z) - BioX-CPath: Biologically-driven Explainable Diagnostics for Multistain IHC Computational Pathology [0.9603373981832565]
BioX-CPath is an explainable graph neural network architecture for whole slide image (WSI) classification.
At its core, BioX-CPath introduces a novel Stain-Aware Attention Pooling (SAAP) module that generates biologically meaningful, stain-aware patient embeddings.
arXiv Detail & Related papers (2025-03-26T18:00:22Z) - From Pixels to Histopathology: A Graph-Based Framework for Interpretable Whole Slide Image Analysis [81.19923502845441]
We develop a graph-based framework that constructs WSI graph representations.
We build tissue representations (nodes) that follow biological boundaries rather than arbitrary patches.
In our method's final step, we solve the diagnostic task through a graph attention network.
arXiv Detail & Related papers (2025-03-14T20:15:04Z) - PathVG: A New Benchmark and Dataset for Pathology Visual Grounding [45.21597220882424]
We propose a new benchmark called Pathology Visual Grounding (PathVG), which aims to detect regions based on expressions with different attributes.
In the experimental study, we found that the biggest challenge was the implicit information underlying the pathological expressions.
The proposed method achieves state-of-the-art performance on the PathVG benchmark.
arXiv Detail & Related papers (2025-02-28T09:13:01Z) - BioMaze: Benchmarking and Enhancing Large Language Models for Biological Pathway Reasoning [49.487327661584686]
We introduce BioMaze, a dataset with 5.1K complex pathway problems from real research.<n>Our evaluation of methods such as CoT and graph-augmented reasoning, shows that LLMs struggle with pathway reasoning.<n>To address this, we propose PathSeeker, an LLM agent that enhances reasoning through interactive subgraph-based navigation.
arXiv Detail & Related papers (2025-02-23T17:38:10Z) - Biology-Instructions: A Dataset and Benchmark for Multi-Omics Sequence Understanding Capability of Large Language Models [55.74944165932666]
We introduce Biology-Instructions, the first large-scale instruction-tuning dataset for multi-omics biological sequences.<n>This dataset bridges large language models (LLMs) and complex biological sequence-related tasks, enhancing their versatility and reasoning.<n>We also highlight significant limitations of current state-of-the-art LLMs on multi-omics tasks without specialized training.
arXiv Detail & Related papers (2024-12-26T12:12:23Z) - Learning to refine domain knowledge for biological network inference [2.209921757303168]
Perturbation experiments allow biologists to discover causal relationships between variables of interest.
The sparsity and high dimensionality of these data pose significant challenges for causal structure learning algorithms.
We propose an amortized algorithm for refining domain knowledge, based on data observations.
arXiv Detail & Related papers (2024-10-18T12:53:23Z) - Progress and Opportunities of Foundation Models in Bioinformatics [77.74411726471439]
Foundations models (FMs) have ushered in a new era in computational biology, especially in the realm of deep learning.
Central to our focus is the application of FMs to specific biological problems, aiming to guide the research community in choosing appropriate FMs for their research needs.
Review analyses challenges and limitations faced by FMs in biology, such as data noise, model explainability, and potential biases.
arXiv Detail & Related papers (2024-02-06T02:29:17Z) - A deep learning pipeline for cross-sectional and longitudinal multiview
data integration [7.424942475653412]
We have developed a pipeline to integrate cross-sectional and longitudinal data from multiple sources.
It includes variable selection/ranking using linear and nonlinear methods, feature extraction using functional principal component analysis and Euler characteristics, and joint integration and classification using dense feed-forward networks and recurrent neural networks.
We applied this pipeline to cross-sectional and longitudinal multi-omics data (metagenomics, transcriptomics, and metabolomics) from an inflammatory bowel disease (IBD) study and we identified microbial pathways, metabolites, and genes that discriminate by IBD status.
arXiv Detail & Related papers (2023-12-02T22:24:35Z) - ProBio: A Protocol-guided Multimodal Dataset for Molecular Biology Lab [67.24684071577211]
The challenge of replicating research results has posed a significant impediment to the field of molecular biology.
We first curate a comprehensive multimodal dataset, named ProBio, as an initial step towards this objective.
Next, we devise two challenging benchmarks, transparent solution tracking and multimodal action recognition, to emphasize the unique characteristics and difficulties associated with activity understanding in BioLab settings.
arXiv Detail & Related papers (2023-11-01T14:44:01Z) - Graph algorithms for predicting subcellular localization at the pathway
level [1.370633147306388]
We develop graph algorithms to predict the localization of all interactions in a biological pathway as an edge-labeling task.
We also perform a case study where we construct biological pathways and predict localizations of human fibroblasts undergoing viral infection.
arXiv Detail & Related papers (2022-12-12T15:49:43Z) - Weakly-supervised Graph Meta-learning for Few-shot Node Classification [53.36828125138149]
We propose a new graph meta-learning framework -- Graph Hallucination Networks (Meta-GHN)
Based on a new robustness-enhanced episodic training, Meta-GHN is meta-learned to hallucinate clean node representations from weakly-labeled data.
Extensive experiments demonstrate the superiority of Meta-GHN over existing graph meta-learning studies.
arXiv Detail & Related papers (2021-06-12T22:22:10Z) - Neural Multi-Hop Reasoning With Logical Rules on Biomedical Knowledge
Graphs [10.244651735862627]
We conduct an empirical study based on the real-world task of drug repurposing.
We formulate this task as a link prediction problem where both compounds and diseases correspond to entities in a knowledge graph.
We propose a new method, PoLo, that combines policy-guided walks based on reinforcement learning with logical rules.
arXiv Detail & Related papers (2021-03-18T16:46:11Z) - Heterogeneous Graph based Deep Learning for Biomedical Network Link
Prediction [7.628651624423363]
We propose a Graph Pair based Link Prediction model (GPLP) for predicting biomedical network links.
InP, 1-hop subgraphs extracted from known network interaction matrix is learnt to predict missing links.
Our method demonstrates the potential applications in other biomedical networks.
arXiv Detail & Related papers (2021-01-28T07:35:29Z) - Learning the Implicit Semantic Representation on Graph-Structured Data [57.670106959061634]
Existing representation learning methods in graph convolutional networks are mainly designed by describing the neighborhood of each node as a perceptual whole.
We propose a Semantic Graph Convolutional Networks (SGCN) that explores the implicit semantics by learning latent semantic-paths in graphs.
arXiv Detail & Related papers (2021-01-16T16:18:43Z) - Knowledge-Guided Multi-Label Few-Shot Learning for General Image
Recognition [75.44233392355711]
KGGR framework exploits prior knowledge of statistical label correlations with deep neural networks.
It first builds a structured knowledge graph to correlate different labels based on statistical label co-occurrence.
Then, it introduces the label semantics to guide learning semantic-specific features.
It exploits a graph propagation network to explore graph node interactions.
arXiv Detail & Related papers (2020-09-20T15:05:29Z) - GCN for HIN via Implicit Utilization of Attention and Meta-paths [104.24467864133942]
Heterogeneous information network (HIN) embedding aims to map the structure and semantic information in a HIN to distributed representations.
We propose a novel neural network method via implicitly utilizing attention and meta-paths.
We first use the multi-layer graph convolutional network (GCN) framework, which performs a discriminative aggregation at each layer.
We then give an effective relaxation and improvement via introducing a new propagation operation which can be separated from aggregation.
arXiv Detail & Related papers (2020-07-06T11:09:40Z) - Inferring Signaling Pathways with Probabilistic Programming [1.8275108630751837]
We implement our method, named Sparse Signaling Pathway Sampling, in Julia using the Gen probabilistic programming language.
We evaluate our algorithm on simulated data and the HPN-DREAM pathway reconstruction challenge.
Our results demonstrate the vast potential for probabilistic programming, and Gen specifically, for biological network inference.
arXiv Detail & Related papers (2020-05-28T14:55:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.