DeepGDel: Deep Learning-based Gene Deletion Prediction Framework for Growth-Coupled Production in Genome-Scale Metabolic Models
- URL: http://arxiv.org/abs/2504.06316v1
- Date: Tue, 08 Apr 2025 08:07:59 GMT
- Title: DeepGDel: Deep Learning-based Gene Deletion Prediction Framework for Growth-Coupled Production in Genome-Scale Metabolic Models
- Authors: Ziwei Yang, Takeyuki Tamura,
- Abstract summary: We propose a framework for predicting gene deletion strategies for growth-coupled production in genome-scale metabolic models.<n>The proposed framework leverages deep learning algorithms to learn and integrate sequential gene and metabolite data representation.<n>Experiment results demonstrate the feasibility of the proposed framework, showing substantial improvements over the baseline method.
- Score: 0.46551592572821365
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In genome-scale constraint-based metabolic models, gene deletion strategies are crucial for achieving growth-coupled production, where cell growth and target metabolite production are simultaneously achieved. While computational methods for calculating gene deletions have been widely explored and contribute to developing gene deletion strategy databases, current approaches are limited in leveraging new data-driven paradigms, such as machine learning, for more efficient strain design. Therefore, it is necessary to propose a fundamental framework for this objective. In this study, we first formulate the problem of gene deletion strategy prediction and then propose a framework for predicting gene deletion strategies for growth-coupled production in genome-scale metabolic models. The proposed framework leverages deep learning algorithms to learn and integrate sequential gene and metabolite data representation, enabling the automatic gene deletion strategy prediction. Computational experiment results demonstrate the feasibility of the proposed framework, showing substantial improvements over the baseline method. Specifically, the proposed framework achieves a 17.64%, 27.15%, and 18.07% increase in overall accuracy across three metabolic models of different scales under study, while maintaining balanced precision and recall in predicting gene deletion statuses. The source code and examples for the framework are publicly available at https://github.com/MetNetComp/DeepGDel.
Related papers
- Modeling Gene Expression Distributional Shifts for Unseen Genetic Perturbations [44.619690829431214]
We train a neural network to predict distributional responses in gene expression following genetic perturbations.<n>Our model predicts gene-level histograms conditioned on perturbations and outperforms baselines in capturing higher-order statistics.
arXiv Detail & Related papers (2025-07-01T06:04:28Z) - Bidirectional Mamba for Single-Cell Data: Efficient Context Learning with Biological Fidelity [0.39945675027960637]
We introduce GeneMamba, a scalable and efficient foundation model for single-cell transcriptomics built on state space modeling.<n>GeneMamba captures bidirectional gene context with linear-time complexity, offering substantial computational gains over transformer baselines.<n>We evaluate GeneMamba across diverse tasks, including multi-batch integration, cell type annotation, and gene-gene correlation, demonstrating strong performance, interpretability, and robustness.
arXiv Detail & Related papers (2025-04-22T20:34:47Z) - Transformer-Based Representation Learning for Robust Gene Expression Modeling and Cancer Prognosis [3.782770832189636]
We present GexBERT, a transformer-based autoencoder framework for robust representation learning of gene expression data.
GexBERT learns context-aware gene embeddings by pretraining on large-scale transcriptomic profiles.
It achieves state-of-the-art classification accuracy from limited gene subsets, improves survival prediction by restoring expression of prognostic anchor genes, and outperforms conventional imputation methods under high missingness.
arXiv Detail & Related papers (2025-04-13T19:49:59Z) - Teaching pathology foundation models to accurately predict gene expression with parameter efficient knowledge transfer [1.5416321520529301]
Efficient Knowledge Adaptation (PEKA) is a novel framework that integrates knowledge distillation and structure alignment losses for cross-modal knowledge transfer.<n>We evaluated PEKA for gene expression prediction using multiple spatial transcriptomics datasets.
arXiv Detail & Related papers (2025-04-09T17:24:41Z) - Beyond Progress Measures: Theoretical Insights into the Mechanism of Grokking [50.465604300990904]
Grokking refers to the abrupt improvement in test accuracy after extended overfitting.
In this work, we investigate the grokking mechanism underlying the Transformer in the task of prime number operations.
arXiv Detail & Related papers (2025-04-04T04:42:38Z) - GENERator: A Long-Context Generative Genomic Foundation Model [66.46537421135996]
We present GENERator, a generative genomic foundation model featuring a context length of 98k base pairs (bp) and 1.2B parameters.<n>Trained on an expansive dataset comprising 386B bp of DNA, the GENERator demonstrates state-of-the-art performance across both established and newly proposed benchmarks.<n>It also shows significant promise in sequence optimization, particularly through the prompt-responsive generation of enhancer sequences with specific activity profiles.
arXiv Detail & Related papers (2025-02-11T05:39:49Z) - Stochastic gradient descent estimation of generalized matrix factorization models with application to single-cell RNA sequencing data [41.94295877935867]
Single-cell RNA sequencing allows the quantitation of gene expression at the individual cell level.<n> Dimensionality reduction is a common preprocessing step to simplify the visualization, clustering, and phenotypic characterization of samples.<n>We present a generalized matrix factorization model assuming a general exponential dispersion family distribution.<n>We propose a scalable adaptive descent algorithm that allows us to estimate the model efficiently.
arXiv Detail & Related papers (2024-12-29T16:02:15Z) - Evaluating Deep Regression Models for WSI-Based Gene-Expression Prediction [3.2995359570845912]
Prediction of mRNA gene-expression profiles directly from routine whole-slide images could potentially offer cost-effective and widely accessible molecular phenotyping.
This study provides recommendations on how deep regression models should be trained for WSI-based gene-expression prediction.
arXiv Detail & Related papers (2024-10-01T16:00:22Z) - VQDNA: Unleashing the Power of Vector Quantization for Multi-Species Genomic Sequence Modeling [60.91599380893732]
VQDNA is a general-purpose framework that renovates genome tokenization from the perspective of genome vocabulary learning.
By leveraging vector-quantized codebooks as learnable vocabulary, VQDNA can adaptively tokenize genomes into pattern-aware embeddings.
arXiv Detail & Related papers (2024-05-13T20:15:03Z) - Efficient and Scalable Fine-Tune of Language Models for Genome
Understanding [49.606093223945734]
We present textscLingo: textscLanguage prefix ftextscIne-tuning for textscGentextscOmes.
Unlike DNA foundation models, textscLingo strategically leverages natural language foundation models' contextual cues.
textscLingo further accommodates numerous downstream fine-tune tasks by an adaptive rank sampling method.
arXiv Detail & Related papers (2024-02-12T21:40:45Z) - Single-Cell Deep Clustering Method Assisted by Exogenous Gene
Information: A Novel Approach to Identifying Cell Types [50.55583697209676]
We develop an attention-enhanced graph autoencoder, which is designed to efficiently capture the topological features between cells.
During the clustering process, we integrated both sets of information and reconstructed the features of both cells and genes to generate a discriminative representation.
This research offers enhanced insights into the characteristics and distribution of cells, thereby laying the groundwork for early diagnosis and treatment of diseases.
arXiv Detail & Related papers (2023-11-28T09:14:55Z) - Causal machine learning for single-cell genomics [94.28105176231739]
We discuss the application of machine learning techniques to single-cell genomics and their challenges.
We first present the model that underlies most of current causal approaches to single-cell biology.
We then identify open problems in the application of causal approaches to single-cell data.
arXiv Detail & Related papers (2023-10-23T13:35:24Z) - Genetic Imitation Learning by Reward Extrapolation [6.340280403330784]
We propose a method called GenIL that integrates the Genetic Algorithm with imitation learning.
The involvement of the Genetic Algorithm improves the data efficiency by reproducing trajectories with various returns.
We tested GenIL in both Atari and Mujoco domains, and the result shows that it successfully outperforms the previous methods.
arXiv Detail & Related papers (2023-01-03T14:12:28Z) - CausalBench: A Large-scale Benchmark for Network Inference from
Single-cell Perturbation Data [61.088705993848606]
We introduce CausalBench, a benchmark suite for evaluating causal inference methods on real-world interventional data.
CaulBench incorporates biologically-motivated performance metrics, including new distribution-based interventional metrics.
arXiv Detail & Related papers (2022-10-31T13:04:07Z) - Modelling Technical and Biological Effects in scRNA-seq data with
Scalable GPLVMs [6.708052194104378]
We extend a popular approach for probabilistic non-linear dimensionality reduction, the Gaussian process latent variable model, to scale to massive single-cell datasets.
The key idea is to use an augmented kernel which preserves the factorisability of the lower bound allowing for fast variational inference.
arXiv Detail & Related papers (2022-09-14T15:25:15Z) - On the Generalization and Adaption Performance of Causal Models [99.64022680811281]
Differentiable causal discovery has proposed to factorize the data generating process into a set of modules.
We study the generalization and adaption performance of such modular neural causal models.
Our analysis shows that the modular neural causal models outperform other models on both zero and few-shot adaptation in low data regimes.
arXiv Detail & Related papers (2022-06-09T17:12:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.