Cross-Modality Protein Embedding for Compound-Protein Affinity and
Contact Prediction
- URL: http://arxiv.org/abs/2012.00651v1
- Date: Sat, 14 Nov 2020 04:42:25 GMT
- Title: Cross-Modality Protein Embedding for Compound-Protein Affinity and
Contact Prediction
- Authors: Yuning You, Yang Shen
- Abstract summary: We consider proteins as multi-modal data including 1D amino-acid sequences and (sequence-predicted) 2D residue-pair contact maps.
We empirically evaluate the embeddings of the two single modalities in their accuracy and generalizability of CPAC prediction.
- Score: 15.955668586941472
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Compound-protein pairs dominate FDA-approved drug-target pairs and the
prediction of compound-protein affinity and contact (CPAC) could help
accelerate drug discovery. In this study we consider proteins as multi-modal
data including 1D amino-acid sequences and (sequence-predicted) 2D residue-pair
contact maps. We empirically evaluate the embeddings of the two single
modalities in their accuracy and generalizability of CPAC prediction (i.e.
structure-free interpretable compound-protein affinity prediction). And we
rationalize their performances in both challenges of embedding individual
modalities and learning generalizable embedding-label relationship. We further
propose two models involving cross-modality protein embedding and establish
that the one with cross interaction (thus capturing correlations among
modalities) outperforms SOTAs and our single modality models in affinity,
contact, and binding-site predictions for proteins never seen in the training
set.
Related papers
- SFM-Protein: Integrative Co-evolutionary Pre-training for Advanced Protein Sequence Representation [97.99658944212675]
We introduce a novel pre-training strategy for protein foundation models.
It emphasizes the interactions among amino acid residues to enhance the extraction of both short-range and long-range co-evolutionary features.
Trained on a large-scale protein sequence dataset, our model demonstrates superior generalization ability.
arXiv Detail & Related papers (2024-10-31T15:22:03Z) - DPLM-2: A Multimodal Diffusion Protein Language Model [75.98083311705182]
We introduce DPLM-2, a multimodal protein foundation model that extends discrete diffusion protein language model (DPLM) to accommodate both sequences and structures.
DPLM-2 learns the joint distribution of sequence and structure, as well as their marginals and conditionals.
Empirical evaluation shows that DPLM-2 can simultaneously generate highly compatible amino acid sequences and their corresponding 3D structures.
arXiv Detail & Related papers (2024-10-17T17:20:24Z) - CoPRA: Bridging Cross-domain Pretrained Sequence Models with Complex Structures for Protein-RNA Binding Affinity Prediction [23.1499716310298]
We build the largest protein-RNA binding affinity dataset PRA310 for performance evaluation.
We provide extensive analyses and verify that CoPRA can (1) accurately predict the protein-RNA binding affinity; (2) understand the binding affinity change caused by mutations; and (3) benefit from scaling data and model size.
arXiv Detail & Related papers (2024-08-21T09:48:22Z) - Accurate Prediction of Ligand-Protein Interaction Affinities with Fine-Tuned Small Language Models [0.0]
We describe the accurate prediction of ligand-protein interaction (LPI) affinities with instruction fine-tuned pretrained generative small language models (SLMs)
Our results demonstrate a clear improvement over machine learning (ML) and free-energy perturbation (FEP+) based methods in accurately predicting a range of LPI affinities.
arXiv Detail & Related papers (2024-06-27T13:04:58Z) - ProLLM: Protein Chain-of-Thoughts Enhanced LLM for Protein-Protein Interaction Prediction [54.132290875513405]
The prediction of protein-protein interactions (PPIs) is crucial for understanding biological functions and diseases.
Previous machine learning approaches to PPI prediction mainly focus on direct physical interactions.
We propose a novel framework ProLLM that employs an LLM tailored for PPI for the first time.
arXiv Detail & Related papers (2024-03-30T05:32:42Z) - PSC-CPI: Multi-Scale Protein Sequence-Structure Contrasting for
Efficient and Generalizable Compound-Protein Interaction Prediction [63.50967073653953]
Compound-Protein Interaction prediction aims to predict the pattern and strength of compound-protein interactions for rational drug discovery.
Existing deep learning-based methods utilize only the single modality of protein sequences or structures.
We propose a novel multi-scale Protein Sequence-structure Contrasting framework for CPI prediction.
arXiv Detail & Related papers (2024-02-13T03:51:10Z) - Improving Protein-peptide Interface Predictions in the Low Data Regime [0.0]
We propose a novel approach for predicting protein-peptide interactions using a bi-modal transformer architecture.
We show that the distributions of inter-facial residue-residue interactions share overlap with inter residue-residue interactions.
This dataaugmentation allows us to leverage the vast amount of protein-only data available in the PepBDB to train neural networks.
arXiv Detail & Related papers (2023-05-31T17:04:27Z) - State-specific protein-ligand complex structure prediction with a
multi-scale deep generative model [68.28309982199902]
We present NeuralPLexer, a computational approach that can directly predict protein-ligand complex structures.
Our study suggests that a data-driven approach can capture the structural cooperativity between proteins and small molecules, showing promise in accelerating the design of enzymes, drug molecules, and beyond.
arXiv Detail & Related papers (2022-09-30T01:46:38Z) - Explainable Deep Relational Networks for Predicting Compound-Protein
Affinities and Contacts [80.69440684790925]
DeepRelations is a physics-inspired deep relational network with intrinsically explainable architecture.
It shows superior interpretability to the state-of-the-art.
It boosts the AUPRC of contact prediction 9.5, 16.9, 19.3 and 5.7-fold for the test, compound-unique, protein-unique, and both-unique sets.
arXiv Detail & Related papers (2019-12-29T00:14:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.