ProFSA: Self-supervised Pocket Pretraining via Protein
Fragment-Surroundings Alignment
- URL: http://arxiv.org/abs/2310.07229v2
- Date: Thu, 7 Mar 2024 07:14:50 GMT
- Title: ProFSA: Self-supervised Pocket Pretraining via Protein
Fragment-Surroundings Alignment
- Authors: Bowen Gao, Yinjun Jia, Yuanle Mo, Yuyan Ni, Weiying Ma, Zhiming Ma,
Yanyan Lan
- Abstract summary: We propose a novel pocket pretraining approach that leverages knowledge from high-resolution atomic protein structures.
Our method, named ProFSA, achieves state-of-the-art performance across various tasks, including pocket druggability prediction.
Our work opens up a new avenue for mitigating the scarcity of protein-ligand complex data through the utilization of high-quality and diverse protein structure databases.
- Score: 20.012210194899605
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pocket representations play a vital role in various biomedical applications,
such as druggability estimation, ligand affinity prediction, and de novo drug
design. While existing geometric features and pretrained representations have
demonstrated promising results, they usually treat pockets independent of
ligands, neglecting the fundamental interactions between them. However, the
limited pocket-ligand complex structures available in the PDB database (less
than 100 thousand non-redundant pairs) hampers large-scale pretraining
endeavors for interaction modeling. To address this constraint, we propose a
novel pocket pretraining approach that leverages knowledge from high-resolution
atomic protein structures, assisted by highly effective pretrained small
molecule representations. By segmenting protein structures into drug-like
fragments and their corresponding pockets, we obtain a reasonable simulation of
ligand-receptor interactions, resulting in the generation of over 5 million
complexes. Subsequently, the pocket encoder is trained in a contrastive manner
to align with the representation of pseudo-ligand furnished by some pretrained
small molecule encoders. Our method, named ProFSA, achieves state-of-the-art
performance across various tasks, including pocket druggability prediction,
pocket matching, and ligand binding affinity prediction. Notably, ProFSA
surpasses other pretraining methods by a substantial margin. Moreover, our work
opens up a new avenue for mitigating the scarcity of protein-ligand complex
data through the utilization of high-quality and diverse protein structure
databases.
Related papers
- PSC-CPI: Multi-Scale Protein Sequence-Structure Contrasting for
Efficient and Generalizable Compound-Protein Interaction Prediction [63.50967073653953]
Compound-Protein Interaction prediction aims to predict the pattern and strength of compound-protein interactions for rational drug discovery.
Existing deep learning-based methods utilize only the single modality of protein sequences or structures.
We propose a novel multi-scale Protein Sequence-structure Contrasting framework for CPI prediction.
arXiv Detail & Related papers (2024-02-13T03:51:10Z) - Protein-ligand binding representation learning from fine-grained
interactions [29.965890962846093]
We propose to learn protein-ligand binding representation in a self-supervised learning manner.
This self-supervised learning problem is formulated as a prediction of the conclusive binding complex structure.
Experiments have demonstrated the superiority of our method across various binding tasks.
arXiv Detail & Related papers (2023-11-09T01:33:09Z) - PharmacoNet: Accelerating Large-Scale Virtual Screening by Deep
Pharmacophore Modeling [0.0]
We describe for the first time a deep-learning framework for structure-based pharmacophore modeling to address this challenge.
PharmacoNet is significantly faster than state-of-the-art structure-based approaches, yet reasonably accurate with a simple scoring function.
arXiv Detail & Related papers (2023-10-01T14:13:09Z) - Efficient Prediction of Peptide Self-assembly through Sequential and
Graphical Encoding [57.89530563948755]
This work provides a benchmark analysis of peptide encoding with advanced deep learning models.
It serves as a guide for a wide range of peptide-related predictions such as isoelectric points, hydration free energy, etc.
arXiv Detail & Related papers (2023-07-17T00:43:33Z) - State-specific protein-ligand complex structure prediction with a
multi-scale deep generative model [68.28309982199902]
We present NeuralPLexer, a computational approach that can directly predict protein-ligand complex structures.
Our study suggests that a data-driven approach can capture the structural cooperativity between proteins and small molecules, showing promise in accelerating the design of enzymes, drug molecules, and beyond.
arXiv Detail & Related papers (2022-09-30T01:46:38Z) - From Static to Dynamic Structures: Improving Binding Affinity Prediction
with a Graph-Based Deep Learning Model [33.92165575735532]
Accurate prediction of the protein-ligand binding affinities is an essential challenge in the structure-based drug design.
Here, we curated an MD dataset containing 3,218 different protein-ligand complexes, and developed Dynaformer, a graph-based deep learning model.
Dynaformer was able to accurately predict the binding affinities by learning the geometric characteristics of the protein-ligand interactions from the MD trajectories.
arXiv Detail & Related papers (2022-08-19T14:55:12Z) - Learning Geometrically Disentangled Representations of Protein Folding
Simulations [72.03095377508856]
This work focuses on learning a generative neural network on a structural ensemble of a drug-target protein.
Model tasks involve characterizing the distinct structural fluctuations of the protein bound to various drug molecules.
Results show that our geometric learning-based method enjoys both accuracy and efficiency for generating complex structural variations.
arXiv Detail & Related papers (2022-05-20T19:38:00Z) - DIPS-Plus: The Enhanced Database of Interacting Protein Structures for
Interface Prediction [2.697420611471228]
We present DIPS-Plus, an enhanced, feature-rich dataset of 42,112 complexes for geometric deep learning of protein interfaces.
The previous version of DIPS contains only the Cartesian coordinates and types of the atoms comprising a given protein complex.
DIPS-Plus now includes a plethora of new residue-level features including protrusion indices, half-sphere amino acid compositions, and new profile hidden Markov model (HMM)-based sequence features for each amino acid.
arXiv Detail & Related papers (2021-06-06T23:56:27Z) - Transfer Learning for Protein Structure Classification at Low Resolution [124.5573289131546]
We show that it is possible to make accurate ($geq$80%) predictions of protein class and architecture from structures determined at low ($leq$3A) resolution.
We provide proof of concept for high-speed, low-cost protein structure classification at low resolution, and a basis for extension to prediction of function.
arXiv Detail & Related papers (2020-08-11T15:01:32Z) - Explainable Deep Relational Networks for Predicting Compound-Protein
Affinities and Contacts [80.69440684790925]
DeepRelations is a physics-inspired deep relational network with intrinsically explainable architecture.
It shows superior interpretability to the state-of-the-art.
It boosts the AUPRC of contact prediction 9.5, 16.9, 19.3 and 5.7-fold for the test, compound-unique, protein-unique, and both-unique sets.
arXiv Detail & Related papers (2019-12-29T00:14:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.