UNADON: Transformer-based model to predict genome-wide chromosome
spatial position
- URL: http://arxiv.org/abs/2304.13230v2
- Date: Sat, 1 Jul 2023 05:29:14 GMT
- Title: UNADON: Transformer-based model to predict genome-wide chromosome
spatial position
- Authors: Muyu Yang and Jian Ma
- Abstract summary: We develop a new transformer-based deep learning model called UNADON.
It predicts the genome-wide cytological distance to a specific type of nuclear body.
It reveals potential sequence and epigenomic factors that affect large-scale compartmentalization to nuclear bodies.
- Score: 2.3980064191633232
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The spatial positioning of chromosomes relative to functional nuclear bodies
is intertwined with genome functions such as transcription. However, the
sequence patterns and epigenomic features that collectively influence chromatin
spatial positioning in a genome-wide manner are not well understood. Here, we
develop a new transformer-based deep learning model called UNADON, which
predicts the genome-wide cytological distance to a specific type of nuclear
body, as measured by TSA-seq, using both sequence features and epigenomic
signals. Evaluations of UNADON in four cell lines (K562, H1, HFFc6, HCT116)
show high accuracy in predicting chromatin spatial positioning to nuclear
bodies when trained on a single cell line. UNADON also performed well in an
unseen cell type. Importantly, we reveal potential sequence and epigenomic
factors that affect large-scale chromatin compartmentalization to nuclear
bodies. Together, UNADON provides new insights into the principles between
sequence features and large-scale chromatin spatial localization, which has
important implications for understanding nuclear structure and function.
Related papers
- VQDNA: Unleashing the Power of Vector Quantization for Multi-Species Genomic Sequence Modeling [60.91599380893732]
VQDNA is a general-purpose framework that renovates genome tokenization from the perspective of genome vocabulary learning.
By leveraging vector-quantized codebooks as learnable vocabulary, VQDNA can adaptively tokenize genomes into pattern-aware embeddings.
arXiv Detail & Related papers (2024-05-13T20:15:03Z) - Whole Genome Transformer for Gene Interaction Effects in Microbiome Habitat Specificity [3.972930262155919]
We propose a framework taking advantage of existing large models for gene vectorization to predict habitat specificity from entire microbial genome sequences.
We train and validate our approach on a large dataset of high quality microbiome genomes from different habitats.
arXiv Detail & Related papers (2024-05-09T09:34:51Z) - Machine and deep learning methods for predicting 3D genome organization [0.0]
Three-Dimensional (3D) enhancer interactions play critical roles in a wide range of cellular processes by regulating gene expression.
Machine learning methods have emerged as an alternative to obtain missing 3D interactions and/or improve resolution.
In this review, we discuss computational tools for predicting three types of 3D interactions (EPIs, interactions, TAD boundaries) and analyze their pros and cons.
arXiv Detail & Related papers (2024-03-04T19:04:41Z) - The cell signaling structure function [0.16060719742433224]
Live cell microscopy captures 5-D $(xy,z,channel,time)$ movies that display patterns of cellular motion and signaling dynamics.
We present here an approach to finding patterns of cell signaling dynamics in 5-D live cell movies unique in requiring no priori knowledge of expected pattern dynamics, and no training data.
arXiv Detail & Related papers (2024-01-04T19:25:00Z) - Single-Cell Deep Clustering Method Assisted by Exogenous Gene
Information: A Novel Approach to Identifying Cell Types [50.55583697209676]
We develop an attention-enhanced graph autoencoder, which is designed to efficiently capture the topological features between cells.
During the clustering process, we integrated both sets of information and reconstructed the features of both cells and genes to generate a discriminative representation.
This research offers enhanced insights into the characteristics and distribution of cells, thereby laying the groundwork for early diagnosis and treatment of diseases.
arXiv Detail & Related papers (2023-11-28T09:14:55Z) - Analyzing scRNA-seq data by CCP-assisted UMAP and t-SNE [0.0]
Correlated clustering and projection (CCP) was introduced as an effective method for preprocessing scRNA-seq data.
CCP is a data-domain approach that does not require matrix diagonalization.
By using eight publicly available datasets, we have found that CCP significantly improves UMAP and t-SNE visualization.
arXiv Detail & Related papers (2023-06-23T19:15:43Z) - Machine Learning Methods for Cancer Classification Using Gene Expression
Data: A Review [77.34726150561087]
Cancer is the second major cause of death after cardiovascular diseases.
Gene expression can play a fundamental role in the early detection of cancer.
This study reviews recent progress in gene expression analysis for cancer classification using machine learning methods.
arXiv Detail & Related papers (2023-01-28T15:03:03Z) - Granger causal inference on DAGs identifies genomic loci regulating
transcription [77.58911272503771]
GrID-Net is a framework based on graph neural networks with lagged message passing for Granger causal inference on DAG-structured systems.
Our application is the analysis of single-cell multimodal data to identify genomic loci that mediate the regulation of specific genes.
arXiv Detail & Related papers (2022-10-18T21:15:10Z) - Intrinsic dimension estimation for discrete metrics [65.5438227932088]
In this letter we introduce an algorithm to infer the intrinsic dimension (ID) of datasets embedded in discrete spaces.
We demonstrate its accuracy on benchmark datasets, and we apply it to analyze a metagenomic dataset for species fingerprinting.
This suggests that evolutive pressure acts on a low-dimensional manifold despite the high-dimensionality of sequences' space.
arXiv Detail & Related papers (2022-07-20T06:38:36Z) - Epigenomic language models powered by Cerebras [0.0]
Epigenomic BERT (or EBERT) learns representations based on both DNA sequence and paired epigenetic state inputs.
We show EBERT's transfer learning potential by demonstrating strong performance on a cell type-specific transcription factor binding prediction task.
Our fine-tuned model exceeds state of the art performance on 4 of 13 evaluation datasets from ENCODE-DREAM benchmarks and earns an overall rank of 3rd on the challenge leaderboard.
arXiv Detail & Related papers (2021-12-14T17:23:42Z) - Multi-modal Self-supervised Pre-training for Regulatory Genome Across
Cell Types [75.65676405302105]
We propose a simple yet effective approach for pre-training genome data in a multi-modal and self-supervised manner, which we call GeneBERT.
We pre-train our model on the ATAC-seq dataset with 17 million genome sequences.
arXiv Detail & Related papers (2021-10-11T12:48:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.