Protein Language Model-Powered 3D Ligand Binding Site Prediction from
Protein Sequence
- URL: http://arxiv.org/abs/2312.03016v1
- Date: Tue, 5 Dec 2023 01:47:38 GMT
- Title: Protein Language Model-Powered 3D Ligand Binding Site Prediction from
Protein Sequence
- Authors: Shuo Zhang, Lei Xie
- Abstract summary: Prediction of binding sites of proteins is an important task for understanding function of proteins and screening potential drugs.
Most existing methods require experimentally determined protein holo-structures as input.
We propose LaMPSite, which only takes protein sequences and ligand molecular graphs as input.
- Score: 16.06613460477948
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prediction of ligand binding sites of proteins is a fundamental and important
task for understanding the function of proteins and screening potential drugs.
Most existing methods require experimentally determined protein holo-structures
as input. However, such structures can be unavailable on novel or less-studied
proteins. To tackle this limitation, we propose LaMPSite, which only takes
protein sequences and ligand molecular graphs as input for ligand binding site
predictions. The protein sequences are used to retrieve residue-level
embeddings and contact maps from the pre-trained ESM-2 protein language model.
The ligand molecular graphs are fed into a graph neural network to compute
atom-level embeddings. Then we compute and update the protein-ligand
interaction embedding based on the protein residue-level embeddings and ligand
atom-level embeddings, and the geometric constraints in the inferred protein
contact map and ligand distance map. A final pooling on protein-ligand
interaction embedding would indicate which residues belong to the binding
sites. Without any 3D coordinate information of proteins, our proposed model
achieves competitive performance compared to baseline methods that require 3D
protein structures when predicting binding sites. Given that less than 50% of
proteins have reliable structure information in the current stage, LaMPSite
will provide new opportunities for drug discovery.
Related papers
- Geometric Self-Supervised Pretraining on 3D Protein Structures using Subgraphs [25.93347924265175]
We propose a novel self-supervised method to pretrain 3D graph neural networks on 3D protein structures.
By considering subgraphs and their relationships to the global protein structure, the model can learn to reason about these hierarchical levels of organization.
arXiv Detail & Related papers (2024-06-20T09:34:31Z) - ProtT3: Protein-to-Text Generation for Text-based Protein Understanding [88.43323947543996]
Language Models (LMs) excel in understanding textual descriptions of proteins.
Protein Language Models (PLMs) can understand and convert protein data into high-quality representations, but struggle to process texts.
We introduce ProtT3, a framework for Protein-to-Text Generation for Text-based Protein Understanding.
arXiv Detail & Related papers (2024-05-21T08:06:13Z) - ProLLM: Protein Chain-of-Thoughts Enhanced LLM for Protein-Protein Interaction Prediction [54.132290875513405]
The prediction of protein-protein interactions (PPIs) is crucial for understanding biological functions and diseases.
Previous machine learning approaches to PPI prediction mainly focus on direct physical interactions.
We propose a novel framework ProLLM that employs an LLM tailored for PPI for the first time.
arXiv Detail & Related papers (2024-03-30T05:32:42Z) - NaNa and MiGu: Semantic Data Augmentation Techniques to Enhance Protein Classification in Graph Neural Networks [60.48306899271866]
We propose novel semantic data augmentation methods to incorporate backbone chemical and side-chain biophysical information into protein classification tasks.
Specifically, we leverage molecular biophysical, secondary structure, chemical bonds, andionic features of proteins to facilitate classification tasks.
arXiv Detail & Related papers (2024-03-21T13:27:57Z) - ProtLLM: An Interleaved Protein-Language LLM with Protein-as-Word Pre-Training [82.37346937497136]
We propose a versatile cross-modal large language model (LLM) for both protein-centric and protein-language tasks.
ProtLLM features a unique dynamic protein mounting mechanism, enabling it to handle complex inputs.
By developing a specialized protein vocabulary, we equip the model with the capability to predict not just natural language but also proteins from a vast pool of candidates.
arXiv Detail & Related papers (2024-02-28T01:29:55Z) - State-specific protein-ligand complex structure prediction with a
multi-scale deep generative model [68.28309982199902]
We present NeuralPLexer, a computational approach that can directly predict protein-ligand complex structures.
Our study suggests that a data-driven approach can capture the structural cooperativity between proteins and small molecules, showing promise in accelerating the design of enzymes, drug molecules, and beyond.
arXiv Detail & Related papers (2022-09-30T01:46:38Z) - Structure-aware Protein Self-supervised Learning [50.04673179816619]
We propose a novel structure-aware protein self-supervised learning method to capture structural information of proteins.
In particular, a well-designed graph neural network (GNN) model is pretrained to preserve the protein structural information.
We identify the relation between the sequential information in the protein language model and the structural information in the specially designed GNN model via a novel pseudo bi-level optimization scheme.
arXiv Detail & Related papers (2022-04-06T02:18:41Z) - Protein 3D structure-based neural networks highly improve the accuracy
in compound-protein binding affinity prediction [7.059949221160259]
We develop Fast Evolutional Attention and Thoroughgoing-graph Neural Networks (FeatNN) to facilitate the application of protein 3D structure information for predicting compound-protein binding affinities (CPAs)
FeatNN considerably outperforms various state-of-the-art baselines in CPA prediction with the Pearson value elevated by about 35.7%.
arXiv Detail & Related papers (2022-03-30T00:44:15Z) - Leveraging Sequence Embedding and Convolutional Neural Network for
Protein Function Prediction [27.212743275697825]
Main challenges of protein function prediction are the large label space and the lack of labeled training data.
Our method leverages unsupervised sequence embedding and the success of deep convolutional neural network to overcome these challenges.
arXiv Detail & Related papers (2021-12-01T08:31:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.