Intrinsic-Extrinsic Convolution and Pooling for Learning on 3D Protein
Structures
- URL: http://arxiv.org/abs/2007.06252v2
- Date: Mon, 19 Apr 2021 17:27:56 GMT
- Title: Intrinsic-Extrinsic Convolution and Pooling for Learning on 3D Protein
Structures
- Authors: Pedro Hermosilla, Marco Sch\"afer, Mat\v{e}j Lang, Gloria Fackelmann,
Pere Pau V\'azquez, Barbora Kozl\'ikov\'a, Michael Krone, Tobias Ritschel,
Timo Ropinski
- Abstract summary: We propose two new learning operations enabling deep 3D analysis of large-scale protein data.
First, we introduce a novel convolution operator which considers both, the intrinsic (invariant under protein folding) as well as extrinsic (invariant under bonding) structure.
Second, we enable a multi-scale protein analysis by introducing hierarchical pooling operators, exploiting the fact that proteins are a recombination of a finite set of amino acids.
- Score: 18.961218808251076
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Proteins perform a large variety of functions in living organisms, thus
playing a key role in biology. As of now, available learning algorithms to
process protein data do not consider several particularities of such data
and/or do not scale well for large protein conformations. To fill this gap, we
propose two new learning operations enabling deep 3D analysis of large-scale
protein data. First, we introduce a novel convolution operator which considers
both, the intrinsic (invariant under protein folding) as well as extrinsic
(invariant under bonding) structure, by using $n$-D convolutions defined on
both the Euclidean distance, as well as multiple geodesic distances between
atoms in a multi-graph. Second, we enable a multi-scale protein analysis by
introducing hierarchical pooling operators, exploiting the fact that proteins
are a recombination of a finite set of amino acids, which can be pooled using
shared pooling matrices. Lastly, we evaluate the accuracy of our algorithms on
several large-scale data sets for common protein analysis tasks, where we
outperform state-of-the-art methods.
Related papers
- SFM-Protein: Integrative Co-evolutionary Pre-training for Advanced Protein Sequence Representation [97.99658944212675]
We introduce a novel pre-training strategy for protein foundation models.
It emphasizes the interactions among amino acid residues to enhance the extraction of both short-range and long-range co-evolutionary features.
Trained on a large-scale protein sequence dataset, our model demonstrates superior generalization ability.
arXiv Detail & Related papers (2024-10-31T15:22:03Z) - Efficiently Predicting Protein Stability Changes Upon Single-point
Mutation with Large Language Models [51.57843608615827]
The ability to precisely predict protein thermostability is pivotal for various subfields and applications in biochemistry.
We introduce an ESM-assisted efficient approach that integrates protein sequence and structural features to predict the thermostability changes in protein upon single-point mutations.
arXiv Detail & Related papers (2023-12-07T03:25:49Z) - CrysFormer: Protein Structure Prediction via 3d Patterson Maps and
Partial Structure Attention [7.716601082662128]
A protein's three-dimensional structure often poses nontrivial computation costs.
We propose the first transformer-based model that directly utilizes protein crystallography and partial structure information.
We demonstrate our method, dubbed textttCrysFormer, can achieve accurate predictions, based on a much smaller dataset size and with reduced computation costs.
arXiv Detail & Related papers (2023-10-05T21:10:22Z) - Pairing interacting protein sequences using masked language modeling [0.3222802562733787]
We develop a method to pair interacting protein sequences using protein language models trained on sequence alignments.
We exploit the ability of MSA Transformer to fill in masked amino acids in multiple sequence alignments using the surrounding context.
We show that it captures inter-chain coevolution while it was trained on single-chain data, which means that it can be used out-of-distribution.
arXiv Detail & Related papers (2023-08-14T13:42:09Z) - Prot2Text: Multimodal Protein's Function Generation with GNNs and Transformers [18.498779242323582]
We propose a novel approach, Prot2Text, which predicts a protein's function in a free text style.
By combining Graph Neural Networks(GNNs) and Large Language Models(LLMs), in an encoder-decoder framework, our model effectively integrates diverse data types.
arXiv Detail & Related papers (2023-07-25T09:35:43Z) - Integration of Pre-trained Protein Language Models into Geometric Deep
Learning Networks [68.90692290665648]
We integrate knowledge learned by protein language models into several state-of-the-art geometric networks.
Our findings show an overall improvement of 20% over baselines.
Strong evidence indicates that the incorporation of protein language models' knowledge enhances geometric networks' capacity by a significant margin.
arXiv Detail & Related papers (2022-12-07T04:04:04Z) - Learning Geometrically Disentangled Representations of Protein Folding
Simulations [72.03095377508856]
This work focuses on learning a generative neural network on a structural ensemble of a drug-target protein.
Model tasks involve characterizing the distinct structural fluctuations of the protein bound to various drug molecules.
Results show that our geometric learning-based method enjoys both accuracy and efficiency for generating complex structural variations.
arXiv Detail & Related papers (2022-05-20T19:38:00Z) - Independent SE(3)-Equivariant Models for End-to-End Rigid Protein
Docking [57.2037357017652]
We tackle rigid body protein-protein docking, i.e., computationally predicting the 3D structure of a protein-protein complex from the individual unbound structures.
We design a novel pairwise-independent SE(3)-equivariant graph matching network to predict the rotation and translation to place one of the proteins at the right docked position.
Our model, named EquiDock, approximates the binding pockets and predicts the docking poses using keypoint matching and alignment.
arXiv Detail & Related papers (2021-11-15T18:46:37Z) - Geometric Transformers for Protein Interface Contact Prediction [3.031630445636656]
We present the Geometric Transformer, a novel geometry-evolving graph transformer for rotation and translation-invariant protein interface contact prediction.
DeepInteract predicts partner-specific protein interface contacts given the 3D tertiary structures of two proteins as input.
arXiv Detail & Related papers (2021-10-06T00:12:15Z) - DIPS-Plus: The Enhanced Database of Interacting Protein Structures for
Interface Prediction [2.697420611471228]
We present DIPS-Plus, an enhanced, feature-rich dataset of 42,112 complexes for geometric deep learning of protein interfaces.
The previous version of DIPS contains only the Cartesian coordinates and types of the atoms comprising a given protein complex.
DIPS-Plus now includes a plethora of new residue-level features including protrusion indices, half-sphere amino acid compositions, and new profile hidden Markov model (HMM)-based sequence features for each amino acid.
arXiv Detail & Related papers (2021-06-06T23:56:27Z) - BERTology Meets Biology: Interpreting Attention in Protein Language
Models [124.8966298974842]
We demonstrate methods for analyzing protein Transformer models through the lens of attention.
We show that attention captures the folding structure of proteins, connecting amino acids that are far apart in the underlying sequence, but spatially close in the three-dimensional structure.
We also present a three-dimensional visualization of the interaction between attention and protein structure.
arXiv Detail & Related papers (2020-06-26T21:50:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.