Related papers: Protein Language Model-Powered 3D Ligand Binding Site Prediction from Protein Sequence

Protein Language Model-Powered 3D Ligand Binding Site Prediction from Protein Sequence

URL: http://arxiv.org/abs/2312.03016v1
Date: Tue, 5 Dec 2023 01:47:38 GMT
Title: Protein Language Model-Powered 3D Ligand Binding Site Prediction from Protein Sequence
Authors: Shuo Zhang, Lei Xie
Abstract summary: Prediction of binding sites of proteins is an important task for understanding function of proteins and screening potential drugs. Most existing methods require experimentally determined protein holo-structures as input. We propose LaMPSite, which only takes protein sequences and ligand molecular graphs as input.
Score: 16.06613460477948
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Prediction of ligand binding sites of proteins is a fundamental and important task for understanding the function of proteins and screening potential drugs. Most existing methods require experimentally determined protein holo-structures as input. However, such structures can be unavailable on novel or less-studied proteins. To tackle this limitation, we propose LaMPSite, which only takes protein sequences and ligand molecular graphs as input for ligand binding site predictions. The protein sequences are used to retrieve residue-level embeddings and contact maps from the pre-trained ESM-2 protein language model. The ligand molecular graphs are fed into a graph neural network to compute atom-level embeddings. Then we compute and update the protein-ligand interaction embedding based on the protein residue-level embeddings and ligand atom-level embeddings, and the geometric constraints in the inferred protein contact map and ligand distance map. A final pooling on protein-ligand interaction embedding would indicate which residues belong to the binding sites. Without any 3D coordinate information of proteins, our proposed model achieves competitive performance compared to baseline methods that require 3D protein structures when predicting binding sites. Given that less than 50% of proteins have reliable structure information in the current stage, LaMPSite will provide new opportunities for drug discovery.

Related papers

GLProtein: Global-and-Local Structure Aware Protein Representation Learning [22.154564355158914]
We argue that the structural information of proteins is not only limited to their 3D information but also encompasses information from amino acid molecules (local information) to protein-protein structure similarity (global information)<n>We propose textbfGLProtein, the first framework in protein pre-training that incorporates both global structural similarity and local amino acid details to enhance prediction accuracy and functional insights.
arXiv Detail & Related papers (2025-05-17T14:45:13Z)
Long-context Protein Language Model [76.95505296417866]
Self-supervised training of language models (LMs) has seen great success for protein sequences in learning meaningful representations and for generative drug design. Most protein LMs are based on the Transformer architecture trained on individual proteins with short context lengths. We propose LC-PLM based on an alternative protein LM architecture, BiMamba-S, built off selective structured state-space models. We also introduce its graph-contextual variant, LC-PLM-G, which contextualizes protein-protein interaction graphs for a second stage of training.
arXiv Detail & Related papers (2024-10-29T16:43:28Z)
Geometric Self-Supervised Pretraining on 3D Protein Structures using Subgraphs [26.727436310732692]
We propose a novel self-supervised method to pretrain 3D graph neural networks on 3D protein structures. We experimentally show that our proposed pertaining strategy leads to significant improvements up to 6%.
arXiv Detail & Related papers (2024-06-20T09:34:31Z)
ProtT3: Protein-to-Text Generation for Text-based Protein Understanding [88.43323947543996]
Language Models (LMs) excel in understanding textual descriptions of proteins. Protein Language Models (PLMs) can understand and convert protein data into high-quality representations, but struggle to process texts. We introduce ProtT3, a framework for Protein-to-Text Generation for Text-based Protein Understanding.
arXiv Detail & Related papers (2024-05-21T08:06:13Z)
ProLLM: Protein Chain-of-Thoughts Enhanced LLM for Protein-Protein Interaction Prediction [54.132290875513405]
The prediction of protein-protein interactions (PPIs) is crucial for understanding biological functions and diseases. Previous machine learning approaches to PPI prediction mainly focus on direct physical interactions. We propose a novel framework ProLLM that employs an LLM tailored for PPI for the first time.
arXiv Detail & Related papers (2024-03-30T05:32:42Z)
NaNa and MiGu: Semantic Data Augmentation Techniques to Enhance Protein Classification in Graph Neural Networks [60.48306899271866]
We propose novel semantic data augmentation methods to incorporate backbone chemical and side-chain biophysical information into protein classification tasks. Specifically, we leverage molecular biophysical, secondary structure, chemical bonds, andionic features of proteins to facilitate classification tasks.
arXiv Detail & Related papers (2024-03-21T13:27:57Z)
ProtLLM: An Interleaved Protein-Language LLM with Protein-as-Word Pre-Training [82.37346937497136]
We propose a versatile cross-modal large language model (LLM) for both protein-centric and protein-language tasks. ProtLLM features a unique dynamic protein mounting mechanism, enabling it to handle complex inputs. By developing a specialized protein vocabulary, we equip the model with the capability to predict not just natural language but also proteins from a vast pool of candidates.
arXiv Detail & Related papers (2024-02-28T01:29:55Z)
Structure-aware Protein Self-supervised Learning [50.04673179816619]
We propose a novel structure-aware protein self-supervised learning method to capture structural information of proteins. In particular, a well-designed graph neural network (GNN) model is pretrained to preserve the protein structural information. We identify the relation between the sequential information in the protein language model and the structural information in the specially designed GNN model via a novel pseudo bi-level optimization scheme.
arXiv Detail & Related papers (2022-04-06T02:18:41Z)
Protein 3D structure-based neural networks highly improve the accuracy in compound-protein binding affinity prediction [7.059949221160259]
We develop Fast Evolutional Attention and Thoroughgoing-graph Neural Networks (FeatNN) to facilitate the application of protein 3D structure information for predicting compound-protein binding affinities (CPAs) FeatNN considerably outperforms various state-of-the-art baselines in CPA prediction with the Pearson value elevated by about 35.7%.
arXiv Detail & Related papers (2022-03-30T00:44:15Z)
Leveraging Sequence Embedding and Convolutional Neural Network for Protein Function Prediction [27.212743275697825]
Main challenges of protein function prediction are the large label space and the lack of labeled training data. Our method leverages unsupervised sequence embedding and the success of deep convolutional neural network to overcome these challenges.
arXiv Detail & Related papers (2021-12-01T08:31:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.