Protein sequence-to-structure learning: Is this the end(-to-end
revolution)?
- URL: http://arxiv.org/abs/2105.07407v1
- Date: Sun, 16 May 2021 10:46:44 GMT
- Title: Protein sequence-to-structure learning: Is this the end(-to-end
revolution)?
- Authors: Elodie Laine, Stephan Eismann, Arne Elofsson, and Sergei Grudinin
- Abstract summary: In CASP14, deep learning has boosted the field to unanticipated levels reaching near-experimental accuracy.
Novel emerging approaches include (i) geometric learning, i.e. learning on representations such as graphs, 3D Voronoi tessellations, and point clouds.
We provide an overview and our opinion of the novel deep learning approaches developed in the last two years and widely used in CASP14.
- Score: 0.8399688944263843
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The potential of deep learning has been recognized in the protein structure
prediction community for some time, and became indisputable after CASP13. In
CASP14, deep learning has boosted the field to unanticipated levels reaching
near-experimental accuracy. This success comes from advances transferred from
other machine learning areas, as well as methods specifically designed to deal
with protein sequences and structures, and their abstractions. Novel emerging
approaches include (i) geometric learning, i.e. learning on representations
such as graphs, 3D Voronoi tessellations, and point clouds; (ii) pre-trained
protein language models leveraging attention; (iii) equivariant architectures
preserving the symmetry of 3D space; (iv) use of large meta-genome databases;
(v) combinations of protein representations; (vi) and finally truly end-to-end
architectures, i.e. differentiable models starting from a sequence and
returning a 3D structure. Here, we provide an overview and our opinion of the
novel deep learning approaches developed in the last two years and widely used
in CASP14.
Related papers
- Geometric Self-Supervised Pretraining on 3D Protein Structures using Subgraphs [26.727436310732692]
We propose a novel self-supervised method to pretrain 3D graph neural networks on 3D protein structures.
We experimentally show that our proposed pertaining strategy leads to significant improvements up to 6%.
arXiv Detail & Related papers (2024-06-20T09:34:31Z) - xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering
the Language of Protein [76.18058946124111]
We propose a unified protein language model, xTrimoPGLM, to address protein understanding and generation tasks simultaneously.
xTrimoPGLM significantly outperforms other advanced baselines in 18 protein understanding benchmarks across four categories.
It can also generate de novo protein sequences following the principles of natural ones, and can perform programmable generation after supervised fine-tuning.
arXiv Detail & Related papers (2024-01-11T15:03:17Z) - A Systematic Survey in Geometric Deep Learning for Structure-based Drug
Design [63.30166298698985]
Structure-based drug design (SBDD) utilizes the three-dimensional geometry of proteins to identify potential drug candidates.
Recent developments in geometric deep learning, focusing on the integration and processing of 3D geometric data, have greatly advanced the field of structure-based drug design.
arXiv Detail & Related papers (2023-06-20T14:21:58Z) - A Systematic Study of Joint Representation Learning on Protein Sequences
and Structures [38.94729758958265]
Learning effective protein representations is critical in a variety of tasks in biology such as predicting protein functions.
Recent sequence representation learning methods based on Protein Language Models (PLMs) excel in sequence-based tasks, but their direct adaptation to tasks involving protein structures remains a challenge.
Our study undertakes a comprehensive exploration of joint protein representation learning by integrating a state-of-the-art PLM with distinct structure encoders.
arXiv Detail & Related papers (2023-03-11T01:24:10Z) - Boosting Convolutional Neural Networks' Protein Binding Site Prediction
Capacity Using SE(3)-invariant transformers, Transfer Learning and
Homology-based Augmentation [1.160208922584163]
Figuring out small binding sites in target proteins, in the resolution of either pocket or residue, is critical in real drugdiscovery scenarios.
Here we present a new computational method for binding site prediction that is relevant to real world applications.
arXiv Detail & Related papers (2023-02-20T05:02:40Z) - Structure-informed Language Models Are Protein Designers [69.70134899296912]
We present LM-Design, a generic approach to reprogramming sequence-based protein language models (pLMs)
We conduct a structural surgery on pLMs, where a lightweight structural adapter is implanted into pLMs and endows it with structural awareness.
Experiments show that our approach outperforms the state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2023-02-03T10:49:52Z) - Integration of Pre-trained Protein Language Models into Geometric Deep
Learning Networks [68.90692290665648]
We integrate knowledge learned by protein language models into several state-of-the-art geometric networks.
Our findings show an overall improvement of 20% over baselines.
Strong evidence indicates that the incorporation of protein language models' knowledge enhances geometric networks' capacity by a significant margin.
arXiv Detail & Related papers (2022-12-07T04:04:04Z) - Learning Geometrically Disentangled Representations of Protein Folding
Simulations [72.03095377508856]
This work focuses on learning a generative neural network on a structural ensemble of a drug-target protein.
Model tasks involve characterizing the distinct structural fluctuations of the protein bound to various drug molecules.
Results show that our geometric learning-based method enjoys both accuracy and efficiency for generating complex structural variations.
arXiv Detail & Related papers (2022-05-20T19:38:00Z) - Structure-aware Protein Self-supervised Learning [50.04673179816619]
We propose a novel structure-aware protein self-supervised learning method to capture structural information of proteins.
In particular, a well-designed graph neural network (GNN) model is pretrained to preserve the protein structural information.
We identify the relation between the sequential information in the protein language model and the structural information in the specially designed GNN model via a novel pseudo bi-level optimization scheme.
arXiv Detail & Related papers (2022-04-06T02:18:41Z) - G-VAE, a Geometric Convolutional VAE for ProteinStructure Generation [41.66010308405784]
We introduce a joint geometric-neural networks approach for comparing, deforming and generating 3D protein structures.
Our method is able to generate plausible structures, different from the structures in the training data.
arXiv Detail & Related papers (2021-06-22T16:52:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.