GraphGDel: Constructing and Learning Graph Representations of Genome-Scale Metabolic Models for Growth-Coupled Gene Deletion Prediction
- URL: http://arxiv.org/abs/2504.06316v6
- Date: Mon, 10 Nov 2025 04:24:28 GMT
- Title: GraphGDel: Constructing and Learning Graph Representations of Genome-Scale Metabolic Models for Growth-Coupled Gene Deletion Prediction
- Authors: Ziwei Yang, Takeyuki Tamura,
- Abstract summary: We introduce a systematic pipeline for constructing graph representations from constraint-based metabolic models.<n>Second, we develop a deep learning framework that integrates these graph representations with gene and metabolite sequence data to predict growth-coupled gene deletion strategies.<n>Our approach consistently outperforms established baselines, achieves improvements of 14.04%, 16.26%, and 13.18% in overall accuracy.
- Score: 1.0909000338858534
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In genome-scale constraint-based metabolic models, gene deletion strategies are essential for achieving growth-coupled production, where cell growth and target metabolite synthesis occur simultaneously. Despite the inherently networked nature of genome-scale metabolic models, existing computational approaches rely primarily on sequential data and lack graph representations that capture their complex relationships, as both well-defined graph constructions and learning frameworks capable of exploiting them remain largely unexplored. To address this gap, we present a twofold solution. First, we introduce a systematic pipeline for constructing graph representations from constraint-based metabolic models. Second, we develop a deep learning framework that integrates these graph representations with gene and metabolite sequence data to predict growth-coupled gene deletion strategies. Across three metabolic models of varying scale, our approach consistently outperforms established baselines, achieves improvements of 14.04%, 16.26%, and 13.18% in overall accuracy. The source code and example datasets are available at: https://github.com/MetNetComp/GraphGDel.
Related papers
- Interpretable Perturbation Modeling Through Biomedical Knowledge Graphs [2.9275990558029075]
multimodal embeddings are integrated into biomedical knowledge graphs.<n>We train a graph attention network to learn the delta expression profile of landmark genes for a given drug-cell pair.<n>Our framework provides a path toward mechanistic drug modeling.
arXiv Detail & Related papers (2025-12-24T04:42:25Z) - Modeling Gene Expression Distributional Shifts for Unseen Genetic Perturbations [44.619690829431214]
We train a neural network to predict distributional responses in gene expression following genetic perturbations.<n>Our model predicts gene-level histograms conditioned on perturbations and outperforms baselines in capturing higher-order statistics.
arXiv Detail & Related papers (2025-07-01T06:04:28Z) - Scalable Graph Generative Modeling via Substructure Sequences [50.32639806800683]
We introduce Generative Graph Pattern Machine (G$2$PM), a generative Transformer pre-training framework for graphs.<n>G$2$PM represents graph instances (nodes, edges, or entire graphs) as sequences of substructures.<n>It employs generative pre-training over the sequences to learn generalizable and transferable representations.
arXiv Detail & Related papers (2025-05-22T02:16:34Z) - Bidirectional Mamba for Single-Cell Data: Efficient Context Learning with Biological Fidelity [0.39945675027960637]
We introduce GeneMamba, a scalable and efficient foundation model for single-cell transcriptomics built on state space modeling.<n>GeneMamba captures bidirectional gene context with linear-time complexity, offering substantial computational gains over transformer baselines.<n>We evaluate GeneMamba across diverse tasks, including multi-batch integration, cell type annotation, and gene-gene correlation, demonstrating strong performance, interpretability, and robustness.
arXiv Detail & Related papers (2025-04-22T20:34:47Z) - Transformer-Based Representation Learning for Robust Gene Expression Modeling and Cancer Prognosis [3.782770832189636]
We present GexBERT, a transformer-based autoencoder framework for robust representation learning of gene expression data.
GexBERT learns context-aware gene embeddings by pretraining on large-scale transcriptomic profiles.
It achieves state-of-the-art classification accuracy from limited gene subsets, improves survival prediction by restoring expression of prognostic anchor genes, and outperforms conventional imputation methods under high missingness.
arXiv Detail & Related papers (2025-04-13T19:49:59Z) - Teaching pathology foundation models to accurately predict gene expression with parameter efficient knowledge transfer [1.5416321520529301]
Efficient Knowledge Adaptation (PEKA) is a novel framework that integrates knowledge distillation and structure alignment losses for cross-modal knowledge transfer.<n>We evaluated PEKA for gene expression prediction using multiple spatial transcriptomics datasets.
arXiv Detail & Related papers (2025-04-09T17:24:41Z) - Beyond Progress Measures: Theoretical Insights into the Mechanism of Grokking [50.465604300990904]
Grokking refers to the abrupt improvement in test accuracy after extended overfitting.
In this work, we investigate the grokking mechanism underlying the Transformer in the task of prime number operations.
arXiv Detail & Related papers (2025-04-04T04:42:38Z) - GENERator: A Long-Context Generative Genomic Foundation Model [66.46537421135996]
We present GENERator, a generative genomic foundation model featuring a context length of 98k base pairs (bp) and 1.2B parameters.<n>Trained on an expansive dataset comprising 386B bp of DNA, the GENERator demonstrates state-of-the-art performance across both established and newly proposed benchmarks.<n>It also shows significant promise in sequence optimization, particularly through the prompt-responsive generation of enhancer sequences with specific activity profiles.
arXiv Detail & Related papers (2025-02-11T05:39:49Z) - Stochastic gradient descent estimation of generalized matrix factorization models with application to single-cell RNA sequencing data [41.94295877935867]
Single-cell RNA sequencing allows the quantitation of gene expression at the individual cell level.<n> Dimensionality reduction is a common preprocessing step to simplify the visualization, clustering, and phenotypic characterization of samples.<n>We present a generalized matrix factorization model assuming a general exponential dispersion family distribution.<n>We propose a scalable adaptive descent algorithm that allows us to estimate the model efficiently.
arXiv Detail & Related papers (2024-12-29T16:02:15Z) - Gene-Metabolite Association Prediction with Interactive Knowledge Transfer Enhanced Graph for Metabolite Production [49.814615043389864]
We propose a new task, Gene-Metabolite Association Prediction based on metabolic graphs.
We present the first benchmark containing 2474 metabolites and 1947 genes of two commonly used microorganisms.
Our proposed methodology outperforms baselines by up to 12.3% across various link prediction frameworks.
arXiv Detail & Related papers (2024-10-24T06:54:27Z) - LLM-Based Multi-Agent Systems are Scalable Graph Generative Models [73.28294528654885]
GraphAgent-Generator (GAG) is a novel simulation-based framework for dynamic, text-attributed social graph generation.<n>GAG simulates the temporal node and edge generation processes for zero-shot social graph generation.<n>The resulting graphs exhibit adherence to seven key macroscopic network properties, achieving an 11% improvement in microscopic graph structure metrics.
arXiv Detail & Related papers (2024-10-13T12:57:08Z) - Evaluating Deep Regression Models for WSI-Based Gene-Expression Prediction [3.2995359570845912]
Prediction of mRNA gene-expression profiles directly from routine whole-slide images could potentially offer cost-effective and widely accessible molecular phenotyping.
This study provides recommendations on how deep regression models should be trained for WSI-based gene-expression prediction.
arXiv Detail & Related papers (2024-10-01T16:00:22Z) - VQDNA: Unleashing the Power of Vector Quantization for Multi-Species Genomic Sequence Modeling [60.91599380893732]
VQDNA is a general-purpose framework that renovates genome tokenization from the perspective of genome vocabulary learning.
By leveraging vector-quantized codebooks as learnable vocabulary, VQDNA can adaptively tokenize genomes into pattern-aware embeddings.
arXiv Detail & Related papers (2024-05-13T20:15:03Z) - Efficient and Scalable Fine-Tune of Language Models for Genome
Understanding [49.606093223945734]
We present textscLingo: textscLanguage prefix ftextscIne-tuning for textscGentextscOmes.
Unlike DNA foundation models, textscLingo strategically leverages natural language foundation models' contextual cues.
textscLingo further accommodates numerous downstream fine-tune tasks by an adaptive rank sampling method.
arXiv Detail & Related papers (2024-02-12T21:40:45Z) - Single-Cell Deep Clustering Method Assisted by Exogenous Gene
Information: A Novel Approach to Identifying Cell Types [50.55583697209676]
We develop an attention-enhanced graph autoencoder, which is designed to efficiently capture the topological features between cells.
During the clustering process, we integrated both sets of information and reconstructed the features of both cells and genes to generate a discriminative representation.
This research offers enhanced insights into the characteristics and distribution of cells, thereby laying the groundwork for early diagnosis and treatment of diseases.
arXiv Detail & Related papers (2023-11-28T09:14:55Z) - Causal machine learning for single-cell genomics [94.28105176231739]
We discuss the application of machine learning techniques to single-cell genomics and their challenges.
We first present the model that underlies most of current causal approaches to single-cell biology.
We then identify open problems in the application of causal approaches to single-cell data.
arXiv Detail & Related papers (2023-10-23T13:35:24Z) - Graph Generation with Diffusion Mixture [57.78958552860948]
Generation of graphs is a major challenge for real-world tasks that require understanding the complex nature of their non-Euclidean structures.
We propose a generative framework that models the topology of graphs by explicitly learning the final graph structures of the diffusion process.
arXiv Detail & Related papers (2023-02-07T17:07:46Z) - Genetic Imitation Learning by Reward Extrapolation [6.340280403330784]
We propose a method called GenIL that integrates the Genetic Algorithm with imitation learning.
The involvement of the Genetic Algorithm improves the data efficiency by reproducing trajectories with various returns.
We tested GenIL in both Atari and Mujoco domains, and the result shows that it successfully outperforms the previous methods.
arXiv Detail & Related papers (2023-01-03T14:12:28Z) - GrannGAN: Graph annotation generative adversarial networks [72.66289932625742]
We consider the problem of modelling high-dimensional distributions and generating new examples of data with complex relational feature structure coherent with a graph skeleton.
The model we propose tackles the problem of generating the data features constrained by the specific graph structure of each data point by splitting the task into two phases.
In the first it models the distribution of features associated with the nodes of the given graph, in the second it complements the edge features conditionally on the node features.
arXiv Detail & Related papers (2022-12-01T11:49:07Z) - CausalBench: A Large-scale Benchmark for Network Inference from
Single-cell Perturbation Data [61.088705993848606]
We introduce CausalBench, a benchmark suite for evaluating causal inference methods on real-world interventional data.
CaulBench incorporates biologically-motivated performance metrics, including new distribution-based interventional metrics.
arXiv Detail & Related papers (2022-10-31T13:04:07Z) - SCGG: A Deep Structure-Conditioned Graph Generative Model [9.046174529859524]
A conditional deep graph generation method called SCGG considers a particular type of structural conditions.
The architecture of SCGG consists of a graph representation learning network and an autoregressive generative model, which is trained end-to-end.
Experimental results on both synthetic and real-world datasets demonstrate the superiority of our method compared with state-of-the-art baselines.
arXiv Detail & Related papers (2022-09-20T12:33:50Z) - Modelling Technical and Biological Effects in scRNA-seq data with
Scalable GPLVMs [6.708052194104378]
We extend a popular approach for probabilistic non-linear dimensionality reduction, the Gaussian process latent variable model, to scale to massive single-cell datasets.
The key idea is to use an augmented kernel which preserves the factorisability of the lower bound allowing for fast variational inference.
arXiv Detail & Related papers (2022-09-14T15:25:15Z) - On the Generalization and Adaption Performance of Causal Models [99.64022680811281]
Differentiable causal discovery has proposed to factorize the data generating process into a set of modules.
We study the generalization and adaption performance of such modular neural causal models.
Our analysis shows that the modular neural causal models outperform other models on both zero and few-shot adaptation in low data regimes.
arXiv Detail & Related papers (2022-06-09T17:12:32Z) - Heterogeneous Graph Neural Networks using Self-supervised Reciprocally
Contrastive Learning [102.9138736545956]
Heterogeneous graph neural network (HGNN) is a very popular technique for the modeling and analysis of heterogeneous graphs.
We develop for the first time a novel and robust heterogeneous graph contrastive learning approach, namely HGCL, which introduces two views on respective guidance of node attributes and graph topologies.
In this new approach, we adopt distinct but most suitable attribute and topology fusion mechanisms in the two views, which are conducive to mining relevant information in attributes and topologies separately.
arXiv Detail & Related papers (2022-04-30T12:57:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.