GRAPE: Heterogeneous Graph Representation Learning for Genetic Perturbation with Coding and Non-Coding Biotype
- URL: http://arxiv.org/abs/2505.03853v1
- Date: Tue, 06 May 2025 03:35:24 GMT
- Title: GRAPE: Heterogeneous Graph Representation Learning for Genetic Perturbation with Coding and Non-Coding Biotype
- Authors: Changxi Chi, Jun Xia, Jingbo Zhou, Jiabei Cheng, Chang Yu, Stan Z. Li,
- Abstract summary: Building gene regulatory networks (GRN) is essential to understand and predict the effects of genetic perturbations.<n>In this work, we leverage pre-trained large language model and DNA sequence model to extract features from gene descriptions and DNA sequence data.<n>We introduce gene biotype information for the first time in genetic perturbation, simulating the distinct roles of genes with different biotypes in regulating cellular processes.
- Score: 51.58774936662233
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Predicting genetic perturbations enables the identification of potentially crucial genes prior to wet-lab experiments, significantly improving overall experimental efficiency. Since genes are the foundation of cellular life, building gene regulatory networks (GRN) is essential to understand and predict the effects of genetic perturbations. However, current methods fail to fully leverage gene-related information, and solely rely on simple evaluation metrics to construct coarse-grained GRN. More importantly, they ignore functional differences between biotypes, limiting the ability to capture potential gene interactions. In this work, we leverage pre-trained large language model and DNA sequence model to extract features from gene descriptions and DNA sequence data, respectively, which serve as the initialization for gene representations. Additionally, we introduce gene biotype information for the first time in genetic perturbation, simulating the distinct roles of genes with different biotypes in regulating cellular processes, while capturing implicit gene relationships through graph structure learning (GSL). We propose GRAPE, a heterogeneous graph neural network (HGNN) that leverages gene representations initialized with features from descriptions and sequences, models the distinct roles of genes with different biotypes, and dynamically refines the GRN through GSL. The results on publicly available datasets show that our method achieves state-of-the-art performance.
Related papers
- GENERator: A Long-Context Generative Genomic Foundation Model [66.46537421135996]
We present GENERator, a generative genomic foundation model featuring a context length of 98k base pairs (bp) and 1.2B parameters.<n>Trained on an expansive dataset comprising 386B bp of DNA, the GENERator demonstrates state-of-the-art performance across both established and newly proposed benchmarks.<n>It also shows significant promise in sequence optimization, particularly through the prompt-responsive generation of enhancer sequences with specific activity profiles.
arXiv Detail & Related papers (2025-02-11T05:39:49Z) - GeneQuery: A General QA-based Framework for Spatial Gene Expression Predictions from Histology Images [41.732831871866516]
Whole-slide hematoxylin and eosin stained histological images are readily accessible and allow for detailed examinations of tissue structure and composition at the microscopic level.<n>Recent advancements have utilized these histological images to predict spatially resolved gene expression profiles.<n>GeneQuery aims to solve this gene expression prediction task in a question-answering (QA) manner for better generality and flexibility.
arXiv Detail & Related papers (2024-11-27T14:33:13Z) - Predicting Genetic Mutation from Whole Slide Images via Biomedical-Linguistic Knowledge Enhanced Multi-label Classification [119.13058298388101]
We develop a Biological-knowledge enhanced PathGenomic multi-label Transformer to improve genetic mutation prediction performances.
BPGT first establishes a novel gene encoder that constructs gene priors by two carefully designed modules.
BPGT then designs a label decoder that finally performs genetic mutation prediction by two tailored modules.
arXiv Detail & Related papers (2024-06-05T06:42:27Z) - VQDNA: Unleashing the Power of Vector Quantization for Multi-Species Genomic Sequence Modeling [60.91599380893732]
VQDNA is a general-purpose framework that renovates genome tokenization from the perspective of genome vocabulary learning.
By leveraging vector-quantized codebooks as learnable vocabulary, VQDNA can adaptively tokenize genomes into pattern-aware embeddings.
arXiv Detail & Related papers (2024-05-13T20:15:03Z) - Efficient and Scalable Fine-Tune of Language Models for Genome
Understanding [49.606093223945734]
We present textscLingo: textscLanguage prefix ftextscIne-tuning for textscGentextscOmes.
Unlike DNA foundation models, textscLingo strategically leverages natural language foundation models' contextual cues.
textscLingo further accommodates numerous downstream fine-tune tasks by an adaptive rank sampling method.
arXiv Detail & Related papers (2024-02-12T21:40:45Z) - MuSe-GNN: Learning Unified Gene Representation From Multimodal
Biological Graph Data [22.938437500266847]
We introduce a novel model called Multimodal Similarity Learning Graph Neural Network.
It combines Multimodal Machine Learning and Deep Graph Neural Networks to learn gene representations from single-cell sequencing and spatial transcriptomic data.
Our model efficiently produces unified gene representations for the analysis of gene functions, tissue functions, diseases, and species evolution.
arXiv Detail & Related papers (2023-09-29T13:33:53Z) - Machine Learning Methods for Cancer Classification Using Gene Expression
Data: A Review [77.34726150561087]
Cancer is the second major cause of death after cardiovascular diseases.
Gene expression can play a fundamental role in the early detection of cancer.
This study reviews recent progress in gene expression analysis for cancer classification using machine learning methods.
arXiv Detail & Related papers (2023-01-28T15:03:03Z) - Gene Function Prediction with Gene Interaction Networks: A Context Graph
Kernel Approach [24.234645183601998]
We propose to use a gene's context graph, i.e., the gene interaction network associated with the focal gene, to infer its functions.
In a kernel-based machine-learning framework, we design a context graph kernel to capture the information in context graphs.
arXiv Detail & Related papers (2022-04-22T02:54:01Z) - VEGN: Variant Effect Prediction with Graph Neural Networks [19.59965282985234]
We propose VEGN, which models variant effect prediction using a graph neural network (GNN) that operates on a heterogeneous graph with genes and variants.
The graph is created by assigning variants to genes and connecting genes with an gene-gene interaction network.
VeGN improves the performance of existing state-of-the-art models.
arXiv Detail & Related papers (2021-06-25T13:51:46Z) - SimpleChrome: Encoding of Combinatorial Effects for Predicting Gene
Expression [8.326669256957352]
We present SimpleChrome, a deep learning model that learns the histone modification representations of genes.
The features learned from the model allow us to better understand the latent effects of cross-gene interactions and direct gene regulation on the target gene expression.
arXiv Detail & Related papers (2020-12-15T23:30:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.