DIPS-Plus: The Enhanced Database of Interacting Protein Structures for
Interface Prediction
- URL: http://arxiv.org/abs/2106.04362v1
- Date: Sun, 6 Jun 2021 23:56:27 GMT
- Title: DIPS-Plus: The Enhanced Database of Interacting Protein Structures for
Interface Prediction
- Authors: Alex Morehead, Chen Chen, Ada Sedova, Jianlin Cheng
- Abstract summary: We present DIPS-Plus, an enhanced, feature-rich dataset of 42,112 complexes for geometric deep learning of protein interfaces.
The previous version of DIPS contains only the Cartesian coordinates and types of the atoms comprising a given protein complex.
DIPS-Plus now includes a plethora of new residue-level features including protrusion indices, half-sphere amino acid compositions, and new profile hidden Markov model (HMM)-based sequence features for each amino acid.
- Score: 2.697420611471228
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: How and where proteins interface with one another can ultimately impact the
proteins' functions along with a range of other biological processes. As such,
precise computational methods for protein interface prediction (PIP) come
highly sought after as they could yield significant advances in drug discovery
and design as well as protein function analysis. However, the traditional
benchmark dataset for this task, Docking Benchmark 5 (DB5), contains only a
paltry 230 complexes for training, validating, and testing different machine
learning algorithms. In this work, we expand on a dataset recently introduced
for this task, the Database of Interacting Protein Structures (DIPS), to
present DIPS-Plus, an enhanced, feature-rich dataset of 42,112 complexes for
geometric deep learning of protein interfaces. The previous version of DIPS
contains only the Cartesian coordinates and types of the atoms comprising a
given protein complex, whereas DIPS-Plus now includes a plethora of new
residue-level features including protrusion indices, half-sphere amino acid
compositions, and new profile hidden Markov model (HMM)-based sequence features
for each amino acid, giving researchers a large, well-curated feature bank for
training protein interface prediction methods.
Related papers
- SFM-Protein: Integrative Co-evolutionary Pre-training for Advanced Protein Sequence Representation [97.99658944212675]
We introduce a novel pre-training strategy for protein foundation models.
It emphasizes the interactions among amino acid residues to enhance the extraction of both short-range and long-range co-evolutionary features.
Trained on a large-scale protein sequence dataset, our model demonstrates superior generalization ability.
arXiv Detail & Related papers (2024-10-31T15:22:03Z) - ProLLM: Protein Chain-of-Thoughts Enhanced LLM for Protein-Protein Interaction Prediction [54.132290875513405]
The prediction of protein-protein interactions (PPIs) is crucial for understanding biological functions and diseases.
Previous machine learning approaches to PPI prediction mainly focus on direct physical interactions.
We propose a novel framework ProLLM that employs an LLM tailored for PPI for the first time.
arXiv Detail & Related papers (2024-03-30T05:32:42Z) - NaNa and MiGu: Semantic Data Augmentation Techniques to Enhance Protein Classification in Graph Neural Networks [60.48306899271866]
We propose novel semantic data augmentation methods to incorporate backbone chemical and side-chain biophysical information into protein classification tasks.
Specifically, we leverage molecular biophysical, secondary structure, chemical bonds, andionic features of proteins to facilitate classification tasks.
arXiv Detail & Related papers (2024-03-21T13:27:57Z) - PSC-CPI: Multi-Scale Protein Sequence-Structure Contrasting for
Efficient and Generalizable Compound-Protein Interaction Prediction [63.50967073653953]
Compound-Protein Interaction prediction aims to predict the pattern and strength of compound-protein interactions for rational drug discovery.
Existing deep learning-based methods utilize only the single modality of protein sequences or structures.
We propose a novel multi-scale Protein Sequence-structure Contrasting framework for CPI prediction.
arXiv Detail & Related papers (2024-02-13T03:51:10Z) - Deep Learning Methods for Protein Family Classification on PDB
Sequencing Data [0.0]
We demonstrate and compare the performance of several deep learning frameworks, including novel bi-directional LSTM and convolutional models, on widely available sequencing data.
Our results show that our deep learning models deliver superior performance to classical machine learning methods, with the convolutional architecture providing the most impressive inference performance.
arXiv Detail & Related papers (2022-07-14T06:11:32Z) - PSP: Million-level Protein Sequence Dataset for Protein Structure
Prediction [34.11168458572554]
We present the first million-level protein structure prediction dataset with high coverage and diversity, named as PSP.
This dataset consists of 570k true structure sequences (10TB) and 745k complementary distillation sequences (15TB)
We provide in addition the benchmark training procedure for SOTA protein structure prediction model on this dataset.
arXiv Detail & Related papers (2022-06-24T14:08:44Z) - PEER: A Comprehensive and Multi-Task Benchmark for Protein Sequence
Understanding [17.770721291090258]
PEER is a comprehensive and multi-task benchmark for Protein sEquence undERstanding.
It provides a set of diverse protein understanding tasks including protein function prediction, protein localization prediction, protein structure prediction, and protein-ligand interaction prediction.
We evaluate different types of sequence-based methods for each task including traditional feature engineering approaches, different sequence encoding methods as well as large-scale pre-trained protein language models.
arXiv Detail & Related papers (2022-06-05T05:21:56Z) - Learning Geometrically Disentangled Representations of Protein Folding
Simulations [72.03095377508856]
This work focuses on learning a generative neural network on a structural ensemble of a drug-target protein.
Model tasks involve characterizing the distinct structural fluctuations of the protein bound to various drug molecules.
Results show that our geometric learning-based method enjoys both accuracy and efficiency for generating complex structural variations.
arXiv Detail & Related papers (2022-05-20T19:38:00Z) - Multi-Scale Representation Learning on Proteins [78.31410227443102]
This paper introduces a multi-scale graph construction of a protein -- HoloProt.
The surface captures coarser details of the protein, while sequence as primary component and structure captures finer details.
Our graph encoder then learns a multi-scale representation by allowing each level to integrate the encoding from level(s) below with the graph at that level.
arXiv Detail & Related papers (2022-04-04T08:29:17Z) - Geometric Transformers for Protein Interface Contact Prediction [3.031630445636656]
We present the Geometric Transformer, a novel geometry-evolving graph transformer for rotation and translation-invariant protein interface contact prediction.
DeepInteract predicts partner-specific protein interface contacts given the 3D tertiary structures of two proteins as input.
arXiv Detail & Related papers (2021-10-06T00:12:15Z) - Intrinsic-Extrinsic Convolution and Pooling for Learning on 3D Protein
Structures [18.961218808251076]
We propose two new learning operations enabling deep 3D analysis of large-scale protein data.
First, we introduce a novel convolution operator which considers both, the intrinsic (invariant under protein folding) as well as extrinsic (invariant under bonding) structure.
Second, we enable a multi-scale protein analysis by introducing hierarchical pooling operators, exploiting the fact that proteins are a recombination of a finite set of amino acids.
arXiv Detail & Related papers (2020-07-13T09:02:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.