Learning to Discover Regulatory Elements for Gene Expression Prediction
- URL: http://arxiv.org/abs/2502.13991v1
- Date: Wed, 19 Feb 2025 03:25:49 GMT
- Title: Learning to Discover Regulatory Elements for Gene Expression Prediction
- Authors: Xingyu Su, Haiyang Yu, Degui Zhi, Shuiwang Ji,
- Abstract summary: Seq2Exp is a Sequence to Expression network designed to discover and extract regulatory elements that drive target gene expression.
Our approach captures the causal relationship between epigenomic signals, DNA sequences and their associated regulatory elements.
- Score: 59.470991831978516
- License:
- Abstract: We consider the problem of predicting gene expressions from DNA sequences. A key challenge of this task is to find the regulatory elements that control gene expressions. Here, we introduce Seq2Exp, a Sequence to Expression network explicitly designed to discover and extract regulatory elements that drive target gene expression, enhancing the accuracy of the gene expression prediction. Our approach captures the causal relationship between epigenomic signals, DNA sequences and their associated regulatory elements. Specifically, we propose to decompose the epigenomic signals and the DNA sequence conditioned on the causal active regulatory elements, and apply an information bottleneck with the Beta distribution to combine their effects while filtering out non-causal components. Our experiments demonstrate that Seq2Exp outperforms existing baselines in gene expression prediction tasks and discovers influential regions compared to commonly used statistical methods for peak detection such as MACS3. The source code is released as part of the AIRS library (https://github.com/divelab/AIRS/).
Related papers
- Gene Regulatory Network Inference in the Presence of Selection Bias and Latent Confounders [14.626706466908386]
Gene Regulatory Network Inference (GRNI) aims to identify causal relationships among genes using gene expression data.
Gene expression is influenced by latent confounders, such as non-coding RNAs, which add complexity to GRNI.
We propose GISL (Gene Regulatory Network Inference in the presence of Selection bias and Latent confounders) to infer true regulatory relationships in the presence of selection and confounding issues.
arXiv Detail & Related papers (2025-01-17T11:27:58Z) - Cross-Attention Graph Neural Networks for Inferring Gene Regulatory Networks with Skewed Degree Distribution [9.919024883502322]
Cross-Attention Complex Dual Graph Embedding Model (XATGRN)
Our model consistently outperforms existing state-of-the-art methods across various datasets.
arXiv Detail & Related papers (2024-12-18T10:56:40Z) - GeneQuery: A General QA-based Framework for Spatial Gene Expression Predictions from Histology Images [41.732831871866516]
Whole-slide hematoxylin and eosin stained histological images are readily accessible and allow for detailed examinations of tissue structure and composition at the microscopic level.
Recent advancements have utilized these histological images to predict spatially resolved gene expression profiles.
GeneQuery aims to solve this gene expression prediction task in a question-answering (QA) manner for better generality and flexibility.
arXiv Detail & Related papers (2024-11-27T14:33:13Z) - Predicting Genetic Mutation from Whole Slide Images via Biomedical-Linguistic Knowledge Enhanced Multi-label Classification [119.13058298388101]
We develop a Biological-knowledge enhanced PathGenomic multi-label Transformer to improve genetic mutation prediction performances.
BPGT first establishes a novel gene encoder that constructs gene priors by two carefully designed modules.
BPGT then designs a label decoder that finally performs genetic mutation prediction by two tailored modules.
arXiv Detail & Related papers (2024-06-05T06:42:27Z) - VQDNA: Unleashing the Power of Vector Quantization for Multi-Species Genomic Sequence Modeling [60.91599380893732]
VQDNA is a general-purpose framework that renovates genome tokenization from the perspective of genome vocabulary learning.
By leveraging vector-quantized codebooks as learnable vocabulary, VQDNA can adaptively tokenize genomes into pattern-aware embeddings.
arXiv Detail & Related papers (2024-05-13T20:15:03Z) - DynGFN: Towards Bayesian Inference of Gene Regulatory Networks with
GFlowNets [81.75973217676986]
Gene regulatory networks (GRN) describe interactions between genes and their products that control gene expression and cellular function.
Existing methods either focus on challenge (1), identifying cyclic structure from dynamics, or on challenge (2) learning complex Bayesian posteriors over DAGs, but not both.
In this paper we leverage the fact that it is possible to estimate the "velocity" of gene expression with RNA velocity techniques to develop an approach that addresses both challenges.
arXiv Detail & Related papers (2023-02-08T16:36:40Z) - Machine Learning Methods for Cancer Classification Using Gene Expression
Data: A Review [77.34726150561087]
Cancer is the second major cause of death after cardiovascular diseases.
Gene expression can play a fundamental role in the early detection of cancer.
This study reviews recent progress in gene expression analysis for cancer classification using machine learning methods.
arXiv Detail & Related papers (2023-01-28T15:03:03Z) - Granger causal inference on DAGs identifies genomic loci regulating
transcription [77.58911272503771]
GrID-Net is a framework based on graph neural networks with lagged message passing for Granger causal inference on DAG-structured systems.
Our application is the analysis of single-cell multimodal data to identify genomic loci that mediate the regulation of specific genes.
arXiv Detail & Related papers (2022-10-18T21:15:10Z) - Isoform Function Prediction Using a Deep Neural Network [9.507435239304591]
Studies have shown that more than 95% of human multi-exon genes have undergone alternative splicing.
Alternative splicing plays a significant role in human health and disease.
This project uses all Conditional data and valuable information such as mRNA sequences, expression profiles, and gene graphs.
arXiv Detail & Related papers (2022-08-05T09:31:25Z) - SimpleChrome: Encoding of Combinatorial Effects for Predicting Gene
Expression [8.326669256957352]
We present SimpleChrome, a deep learning model that learns the histone modification representations of genes.
The features learned from the model allow us to better understand the latent effects of cross-gene interactions and direct gene regulation on the target gene expression.
arXiv Detail & Related papers (2020-12-15T23:30:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.