Protein-ligand binding representation learning from fine-grained
interactions
- URL: http://arxiv.org/abs/2311.16160v1
- Date: Thu, 9 Nov 2023 01:33:09 GMT
- Title: Protein-ligand binding representation learning from fine-grained
interactions
- Authors: Shikun Feng, Minghao Li, Yinjun Jia, Weiying Ma, Yanyan Lan
- Abstract summary: We propose to learn protein-ligand binding representation in a self-supervised learning manner.
This self-supervised learning problem is formulated as a prediction of the conclusive binding complex structure.
Experiments have demonstrated the superiority of our method across various binding tasks.
- Score: 29.965890962846093
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The binding between proteins and ligands plays a crucial role in the realm of
drug discovery. Previous deep learning approaches have shown promising results
over traditional computationally intensive methods, but resulting in poor
generalization due to limited supervised data. In this paper, we propose to
learn protein-ligand binding representation in a self-supervised learning
manner. Different from existing pre-training approaches which treat proteins
and ligands individually, we emphasize to discern the intricate binding
patterns from fine-grained interactions. Specifically, this self-supervised
learning problem is formulated as a prediction of the conclusive binding
complex structure given a pocket and ligand with a Transformer based
interaction module, which naturally emulates the binding process. To ensure the
representation of rich binding information, we introduce two pre-training
tasks, i.e.~atomic pairwise distance map prediction and mask ligand
reconstruction, which comprehensively model the fine-grained interactions from
both structure and feature space. Extensive experiments have demonstrated the
superiority of our method across various binding tasks, including
protein-ligand affinity prediction, virtual screening and protein-ligand
docking.
Related papers
- SFM-Protein: Integrative Co-evolutionary Pre-training for Advanced Protein Sequence Representation [97.99658944212675]
We introduce a novel pre-training strategy for protein foundation models.
It emphasizes the interactions among amino acid residues to enhance the extraction of both short-range and long-range co-evolutionary features.
Trained on a large-scale protein sequence dataset, our model demonstrates superior generalization ability.
arXiv Detail & Related papers (2024-10-31T15:22:03Z) - A distributional simplicity bias in the learning dynamics of transformers [50.91742043564049]
We show that transformers, trained on natural language data, also display a simplicity bias.
Specifically, they sequentially learn many-body interactions among input tokens, reaching a saturation point in the prediction error for low-degree interactions.
This approach opens up the possibilities of studying how interactions of different orders in the data affect learning, in natural language processing and beyond.
arXiv Detail & Related papers (2024-10-25T15:39:34Z) - Assessing interaction recovery of predicted protein-ligand poses [0.39331876802505306]
We show that ignoring protein-ligand interaction fingerprints can lead to overestimation of model performance.
In this work, we demonstrate that ignoring protein-ligand interaction fingerprints can lead to overestimation of model performance.
arXiv Detail & Related papers (2024-09-30T12:06:13Z) - Learning to Predict Mutation Effects of Protein-Protein Interactions by Microenvironment-aware Hierarchical Prompt Learning [78.38442423223832]
We develop a novel codebook pre-training task, namely masked microenvironment modeling.
We demonstrate superior performance and training efficiency over state-of-the-art pre-training-based methods in mutation effect prediction.
arXiv Detail & Related papers (2024-05-16T03:53:21Z) - Protein Representation Learning by Capturing Protein Sequence-Structure-Function Relationship [12.11413472492417]
AMMA adopts a unified multi-modal encoder to integrate all three modalities into a unified representation space.
AMMA is highly effective in learning protein representations that exhibit well-aligned inter-modal relationships.
arXiv Detail & Related papers (2024-04-29T05:42:29Z) - Improved K-mer Based Prediction of Protein-Protein Interactions With
Chaos Game Representation, Deep Learning and Reduced Representation Bias [0.0]
We present a method for extracting unique pairs from an interaction dataset, generating non-redundant paired data for unbiased machine learning.
We develop a convolutional neural network model capable of learning and predicting interactions from Chaos Game Representations of proteins' coding genes.
arXiv Detail & Related papers (2023-10-23T10:02:23Z) - ProFSA: Self-supervised Pocket Pretraining via Protein
Fragment-Surroundings Alignment [20.012210194899605]
We propose a novel pocket pretraining approach that leverages knowledge from high-resolution atomic protein structures.
Our method, named ProFSA, achieves state-of-the-art performance across various tasks, including pocket druggability prediction.
Our work opens up a new avenue for mitigating the scarcity of protein-ligand complex data through the utilization of high-quality and diverse protein structure databases.
arXiv Detail & Related papers (2023-10-11T06:36:23Z) - Growing ecosystem of deep learning methods for modeling
protein$\unicode{x2013}$protein interactions [0.0]
We discuss the growing ecosystem of deep learning methods for modeling protein interactions.
Opportunities abound to discover novel interactions, modulate their physical mechanisms, and engineer binders to unravel their functions.
arXiv Detail & Related papers (2023-10-10T15:53:27Z) - State-specific protein-ligand complex structure prediction with a
multi-scale deep generative model [68.28309982199902]
We present NeuralPLexer, a computational approach that can directly predict protein-ligand complex structures.
Our study suggests that a data-driven approach can capture the structural cooperativity between proteins and small molecules, showing promise in accelerating the design of enzymes, drug molecules, and beyond.
arXiv Detail & Related papers (2022-09-30T01:46:38Z) - A Trainable Optimal Transport Embedding for Feature Aggregation and its
Relationship to Attention [96.77554122595578]
We introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference.
Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost.
arXiv Detail & Related papers (2020-06-22T08:35:58Z) - Explainable Deep Relational Networks for Predicting Compound-Protein
Affinities and Contacts [80.69440684790925]
DeepRelations is a physics-inspired deep relational network with intrinsically explainable architecture.
It shows superior interpretability to the state-of-the-art.
It boosts the AUPRC of contact prediction 9.5, 16.9, 19.3 and 5.7-fold for the test, compound-unique, protein-unique, and both-unique sets.
arXiv Detail & Related papers (2019-12-29T00:14:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.