TFBS-Finder: Deep Learning-based Model with DNABERT and Convolutional Networks to Predict Transcription Factor Binding Sites
- URL: http://arxiv.org/abs/2502.01311v1
- Date: Mon, 03 Feb 2025 12:41:11 GMT
- Title: TFBS-Finder: Deep Learning-based Model with DNABERT and Convolutional Networks to Predict Transcription Factor Binding Sites
- Authors: Nimisha Ghosh, Pratik Dutta, Daniele Santoni,
- Abstract summary: We have developed a deep learning model which uses pre-trained DNABERT, a Convolutional Neural Network (CNN) module, a Modified Convolutional Block Attention Module (MCBAM), a Multi-Scale Convolutions with Attention (MSCA) module and an output module.
TFBS-Finder is trained and tested on 165 ENCODE ChIP-seq datasets.
- Score: 4.293083690039339
- License:
- Abstract: Transcription factors are proteins that regulate the expression of genes by binding to specific genomic regions known as Transcription Factor Binding Sites (TFBSs), typically located in the promoter regions of those genes. Accurate prediction of these binding sites is essential for understanding the complex gene regulatory networks underlying various cellular functions. In this regard, many deep learning models have been developed for such prediction, but there is still scope of improvement. In this work, we have developed a deep learning model which uses pre-trained DNABERT, a Convolutional Neural Network (CNN) module, a Modified Convolutional Block Attention Module (MCBAM), a Multi-Scale Convolutions with Attention (MSCA) module and an output module. The pre-trained DNABERT is used for sequence embedding, thereby capturing the long-term dependencies in the DNA sequences while the CNN, MCBAM and MSCA modules are useful in extracting higher-order local features. TFBS-Finder is trained and tested on 165 ENCODE ChIP-seq datasets. We have also performed ablation studies as well as cross-cell line validations and comparisons with other models. The experimental results show the superiority of the proposed method in predicting TFBSs compared to the existing methodologies. The codes and the relevant datasets are publicly available at https://github.com/NimishaGhosh/TFBS-Finder/.
Related papers
- Assessing Neural Network Representations During Training Using
Noise-Resilient Diffusion Spectral Entropy [55.014926694758195]
Entropy and mutual information in neural networks provide rich information on the learning process.
We leverage data geometry to access the underlying manifold and reliably compute these information-theoretic measures.
We show that they form noise-resistant measures of intrinsic dimensionality and relationship strength in high-dimensional simulated data.
arXiv Detail & Related papers (2023-12-04T01:32:42Z) - Predicting Transcription Factor Binding Sites using Transformer based
Capsule Network [0.8793721044482612]
Prediction of binding sites for transcription factors is important to understand how they regulate gene expression and how this regulation can be modulated for therapeutic purposes.
DNABERT-Cap is a bidirectional encoder pre-trained with large number of genomic DNA sequences, empowered with a capsule layer responsible for the final prediction.
DNABERT-Cap is also compared with existing state-of-the-art deep learning based predictors viz. DeepARC, DeepTF, CNN-Zeng and DeepBind, and is seen to outperform them.
arXiv Detail & Related papers (2023-10-23T09:08:57Z) - Convolutional Monge Mapping Normalization for learning on sleep data [63.22081662149488]
We propose a new method called Convolutional Monge Mapping Normalization (CMMN)
CMMN consists in filtering the signals in order to adapt their power spectrum density (PSD) to a Wasserstein barycenter estimated on training data.
Numerical experiments on sleep EEG data show that CMMN leads to significant and consistent performance gains independent from the neural network architecture.
arXiv Detail & Related papers (2023-05-30T08:24:01Z) - HAC-Net: A Hybrid Attention-Based Convolutional Neural Network for
Highly Accurate Protein-Ligand Binding Affinity Prediction [0.0]
We present a novel deep learning architecture consisting of a 3-dimensional convolutional neural network and two graph convolutional networks.
HAC-Net obtains state-of-the-art results on the PDBbind v.2016 core set.
We envision that this model can be extended to a broad range of supervised learning problems related to structure-based biomolecular property prediction.
arXiv Detail & Related papers (2022-12-23T16:14:53Z) - Neural Attentive Circuits [93.95502541529115]
We introduce a general purpose, yet modular neural architecture called Neural Attentive Circuits (NACs)
NACs learn the parameterization and a sparse connectivity of neural modules without using domain knowledge.
NACs achieve an 8x speedup at inference time while losing less than 3% performance.
arXiv Detail & Related papers (2022-10-14T18:00:07Z) - Brain Network Transformer [13.239896897835191]
We study Transformer-based models for brain network analysis.
Driven by the unique properties of data, we model brain networks as graphs with nodes of fixed size and order.
We re-standardize the evaluation pipeline on the only one publicly available large-scale brain network dataset of ABIDE.
arXiv Detail & Related papers (2022-10-13T02:30:06Z) - Isoform Function Prediction Using a Deep Neural Network [9.507435239304591]
Studies have shown that more than 95% of human multi-exon genes have undergone alternative splicing.
Alternative splicing plays a significant role in human health and disease.
This project uses all Conditional data and valuable information such as mRNA sequences, expression profiles, and gene graphs.
arXiv Detail & Related papers (2022-08-05T09:31:25Z) - On the Generalization and Adaption Performance of Causal Models [99.64022680811281]
Differentiable causal discovery has proposed to factorize the data generating process into a set of modules.
We study the generalization and adaption performance of such modular neural causal models.
Our analysis shows that the modular neural causal models outperform other models on both zero and few-shot adaptation in low data regimes.
arXiv Detail & Related papers (2022-06-09T17:12:32Z) - Comparison of machine learning and deep learning techniques in promoter
prediction across diverse species [1.8899300124593648]
We studied methods for vector encoding and promoter classification using genome sequences of three higher eukaryotes viz. yeast, A. thaliana and human.
We found CNN to be superior in classification of promoters from non-promoter sequences (binary classification) as well as species-specific classification of promoter sequences (multiclass classification)
arXiv Detail & Related papers (2021-05-17T08:15:41Z) - The Heterogeneity Hypothesis: Finding Layer-Wise Differentiated Network
Architectures [179.66117325866585]
We investigate a design space that is usually overlooked, i.e. adjusting the channel configurations of predefined networks.
We find that this adjustment can be achieved by shrinking widened baseline networks and leads to superior performance.
Experiments are conducted on various networks and datasets for image classification, visual tracking and image restoration.
arXiv Detail & Related papers (2020-06-29T17:59:26Z) - A Systematic Approach to Featurization for Cancer Drug Sensitivity
Predictions with Deep Learning [49.86828302591469]
We train >35,000 neural network models, sweeping over common featurization techniques.
We found the RNA-seq to be highly redundant and informative even with subsets larger than 128 features.
arXiv Detail & Related papers (2020-04-30T20:42:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.