Related papers: ESM-NBR: fast and accurate nucleic acid-binding residue prediction via protein language model feature representation and multi-task learning

ESM-NBR: fast and accurate nucleic acid-binding residue prediction via protein language model feature representation and multi-task learning

URL: http://arxiv.org/abs/2312.00842v1
Date: Fri, 1 Dec 2023 04:00:20 GMT
Title: ESM-NBR: fast and accurate nucleic acid-binding residue prediction via protein language model feature representation and multi-task learning
Authors: Wenwu Zeng, Dafeng Lv, Wenjuan Liu, Shaoliang Peng
Abstract summary: We propose a fast and accurate sequence-based method, called ESM-NBR, to predict nucleic acid-binding residues. Experimental results on benchmark data sets demonstrate that the prediction performance of ESM2 feature representation comprehensively outperforms evolutionary information-based hidden Markov model (HMM) features. By completely discarding the time-cost multiple sequence alignment process, the prediction speed of ESM-NBR far exceeds that of existing methods.
Score: 1.6576008113462954
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Protein-nucleic acid interactions play a very important role in a variety of biological activities. Accurate identification of nucleic acid-binding residues is a critical step in understanding the interaction mechanisms. Although many computationally based methods have been developed to predict nucleic acid-binding residues, challenges remain. In this study, a fast and accurate sequence-based method, called ESM-NBR, is proposed. In ESM-NBR, we first use the large protein language model ESM2 to extract discriminative biological properties feature representation from protein primary sequences; then, a multi-task deep learning model composed of stacked bidirectional long short-term memory (BiLSTM) and multi-layer perceptron (MLP) networks is employed to explore common and private information of DNA- and RNA-binding residues with ESM2 feature as input. Experimental results on benchmark data sets demonstrate that the prediction performance of ESM2 feature representation comprehensively outperforms evolutionary information-based hidden Markov model (HMM) features. Meanwhile, the ESM-NBR obtains the MCC values for DNA-binding residues prediction of 0.427 and 0.391 on two independent test sets, which are 18.61 and 10.45% higher than those of the second-best methods, respectively. Moreover, by completely discarding the time-cost multiple sequence alignment process, the prediction speed of ESM-NBR far exceeds that of existing methods (5.52s for a protein sequence of length 500, which is about 16 times faster than the second-fastest method). A user-friendly standalone package and the data of ESM-NBR are freely available for academic use at: https://github.com/wwzll123/ESM-NBR.

Related papers

DiffNMR2: NMR Guided Sampling Acquisition Through Diffusion Model Uncertainty [2.4634393035848494]
We propose a novel sub-sampling strategy based on a diffusion model trained on protein NMR data. Our method iteratively reconstructs under-sampled spectra while using model uncertainty to guide subsequent sampling, significantly reducing acquisition time. This advancement holds promise for many applications, from drug discovery to materials science, where rapid and high-resolution spectral analysis is critical.
arXiv Detail & Related papers (2025-02-06T20:10:28Z)
Diffusion Model with Representation Alignment for Protein Inverse Folding [53.139837825588614]
Protein inverse folding is a fundamental problem in bioinformatics, aiming to recover the amino acid sequences from a given protein backbone structure. We propose a novel method that leverages diffusion models with representation alignment (DMRA) In experiments, we conduct extensive evaluations on the CATH4.2 dataset to demonstrate that DMRA outperforms leading methods.
arXiv Detail & Related papers (2024-12-12T15:47:59Z)
SeqProFT: Applying LoRA Finetuning for Sequence-only Protein Property Predictions [8.112057136324431]
This study employs the LoRA method to perform end-to-end fine-tuning of the ESM-2 model. A multi-head attention mechanism is integrated into the downstream network to combine sequence features with contact map information.
arXiv Detail & Related papers (2024-11-18T12:40:39Z)
SFM-Protein: Integrative Co-evolutionary Pre-training for Advanced Protein Sequence Representation [97.99658944212675]
We introduce a novel pre-training strategy for protein foundation models. It emphasizes the interactions among amino acid residues to enhance the extraction of both short-range and long-range co-evolutionary features. Trained on a large-scale protein sequence dataset, our model demonstrates superior generalization ability.
arXiv Detail & Related papers (2024-10-31T15:22:03Z)
Accelerating Drug Safety Assessment using Bidirectional-LSTM for SMILES Data [0.0]
Bi-Directional Long Short Term Memory (BiLSTM) is a variant of Recurrent Neural Network (RNN) that processes input molecular sequences. The proposed work aims to understand the sequential patterns encoded in the SMILES strings, which are then utilised for predicting the toxicity of the molecules.
arXiv Detail & Related papers (2024-07-08T18:12:11Z)
NovoBench: Benchmarking Deep Learning-based De Novo Peptide Sequencing Methods in Proteomics [58.03989832372747]
We present the first unified benchmark NovoBench for emphde novo peptide sequencing. It comprises diverse mass spectrum data, integrated models, and comprehensive evaluation metrics. Recent methods, including DeepNovo, PointNovo, Casanovo, InstaNovo, AdaNovo and $pi$-HelixNovo are integrated into our framework.
arXiv Detail & Related papers (2024-06-16T08:23:21Z)
DisorderUnetLM: Validating ProteinUnet for efficient protein intrinsic disorder prediction [0.0]
The prediction of intrinsic disorder regions has significant implications for understanding protein functions and dynamics. Recently, a new generation of predictors based on protein language models (pLMs) is emerging. The article introduces the new DisorderUnetLM disorder predictor, which builds upon the idea of ProteinUnet.
arXiv Detail & Related papers (2024-04-11T20:14:14Z)
A Multi-Grained Symmetric Differential Equation Model for Learning Protein-Ligand Binding Dynamics [73.35846234413611]
In drug discovery, molecular dynamics (MD) simulation provides a powerful tool for predicting binding affinities, estimating transport properties, and exploring pocket sites. We propose NeuralMD, the first machine learning (ML) surrogate that can facilitate numerical MD and provide accurate simulations in protein-ligand binding dynamics. We demonstrate the efficiency and effectiveness of NeuralMD, achieving over 1K$times$ speedup compared to standard numerical MD simulations.
arXiv Detail & Related papers (2024-01-26T09:35:17Z)
Efficiently Predicting Protein Stability Changes Upon Single-point Mutation with Large Language Models [51.57843608615827]
The ability to precisely predict protein thermostability is pivotal for various subfields and applications in biochemistry. We introduce an ESM-assisted efficient approach that integrates protein sequence and structural features to predict the thermostability changes in protein upon single-point mutations.
arXiv Detail & Related papers (2023-12-07T03:25:49Z)
MATE-Pred: Multimodal Attention-based TCR-Epitope interaction Predictor [1.933856957193398]
An accurate binding prediction between T-cell receptors ands contributes decisively to successful immunotherapy strategies. Here, we propose a highly reliable novel method, MATE-Pred, that performs attention-based prediction of T-cell receptors and affinitys binding regimes. The performance of MATE-Pred projects its potential application in drug discovery.
arXiv Detail & Related papers (2023-12-05T11:30:00Z)
Accurate Machine Learned Quantum-Mechanical Force Fields for Biomolecular Simulations [51.68332623405432]
Molecular dynamics (MD) simulations allow atomistic insights into chemical and biological processes. Recently, machine learned force fields (MLFFs) emerged as an alternative means to execute MD simulations. This work proposes a general approach to constructing accurate MLFFs for large-scale molecular simulations.
arXiv Detail & Related papers (2022-05-17T13:08:28Z)
Decoding the Protein-ligand Interactions Using Parallel Graph Neural Networks [6.460973806588082]
We present a novel parallel graph neural network (GNN) to integrate knowledge representation and reasoning for PLI prediction. Our method can serve as an interpretable and explainable artificial intelligence (AI) tool for predicted activity, potency, and biophysical properties of lead candidates.
arXiv Detail & Related papers (2021-11-30T06:02:04Z)
Confidence-guided Lesion Mask-based Simultaneous Synthesis of Anatomic and Molecular MR Images in Patients with Post-treatment Malignant Gliomas [65.64363834322333]
Confidence Guided SAMR (CG-SAMR) synthesizes data from lesion information to multi-modal anatomic sequences. module guides the synthesis based on confidence measure about the intermediate results. experiments on real clinical data demonstrate that the proposed model can perform better than the state-of-theart synthesis methods.
arXiv Detail & Related papers (2020-08-06T20:20:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.