Related papers: Boosting Convolutional Neural Networks' Protein Binding Site Prediction Capacity Using SE(3)-invariant transformers, Transfer Learning and Homology-based Augmentation

Boosting Convolutional Neural Networks' Protein Binding Site Prediction Capacity Using SE(3)-invariant transformers, Transfer Learning and Homology-based Augmentation

URL: http://arxiv.org/abs/2303.08818v2
Date: Tue, 18 Apr 2023 05:05:00 GMT
Title: Boosting Convolutional Neural Networks' Protein Binding Site Prediction Capacity Using SE(3)-invariant transformers, Transfer Learning and Homology-based Augmentation
Authors: Daeseok Lee, Jeunghyun Byun and Bonggun Shin
Abstract summary: Figuring out small binding sites in target proteins, in the resolution of either pocket or residue, is critical in real drugdiscovery scenarios. Here we present a new computational method for binding site prediction that is relevant to real world applications.
Score: 1.160208922584163
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Figuring out small molecule binding sites in target proteins, in the resolution of either pocket or residue, is critical in many virtual and real drug-discovery scenarios. Since it is not always easy to find such binding sites based on domain knowledge or traditional methods, different deep learning methods that predict binding sites out of protein structures have been developed in recent years. Here we present a new such deep learning algorithm, that significantly outperformed all state-of-the-art baselines in terms of the both resolutions$\unicode{x2013}$pocket and residue. This good performance was also demonstrated in a case study involving the protein human serum albumin and its binding sites. Our algorithm included new ideas both in the model architecture and in the training method. For the model architecture, it incorporated SE(3)-invariant geometric self-attention layers that operate on top of residue-level CNN outputs. This residue-level processing of the model allowed a transfer learning between the two resolutions, which turned out to significantly improve the binding pocket prediction. Moreover, we developed novel augmentation method based on protein homology, which prevented our model from over-fitting. Overall, we believe that our contribution to the literature is twofold. First, we provided a new computational method for binding site prediction that is relevant to real-world applications, as shown by the good performance on different benchmarks and case study. Second, the novel ideas in our method$\unicode{x2013}$the model architecture, transfer learning and the homology augmentation$\unicode{x2013}$would serve as useful components in future works.

Related papers

Deep Manifold Transformation for Protein Representation Learning [42.43017670985785]
We propose a new underlinedeep underlinemanifold underlinetrans approach for universal underlineprotein underlinerepresentation underlinelformation (DMTPRL) It employs manifold learning strategies to improve the quality and adaptability of the learned embeddings. Our proposed DMTPRL method outperforms state-of-the-art baselines on diverse downstream tasks across popular datasets.
arXiv Detail & Related papers (2024-01-12T18:38:14Z)
The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute [66.84421705029624]
We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours. We pre-process an existing large, diverse, and high-quality dataset of books that surpasses existing academic benchmarks in quality, diversity, and document length. This work also provides two baseline models: a feed-forward model derived from the GPT-2 architecture and a recurrent model in the form of a novel LSTM with ten-fold throughput.
arXiv Detail & Related papers (2023-09-20T10:31:17Z)
Tackling Computational Heterogeneity in FL: A Few Theoretical Insights [68.8204255655161]
We introduce and analyse a novel aggregation framework that allows for formalizing and tackling computational heterogeneous data. Proposed aggregation algorithms are extensively analyzed from a theoretical, and an experimental prospective.
arXiv Detail & Related papers (2023-07-12T16:28:21Z)
Universal Domain Adaptation from Foundation Models: A Baseline Study [58.51162198585434]
We make empirical studies of state-of-the-art UniDA methods using foundation models. We introduce textitCLIP distillation, a parameter-free method specifically designed to distill target knowledge from CLIP models. Although simple, our method outperforms previous approaches in most benchmark tasks.
arXiv Detail & Related papers (2023-05-18T16:28:29Z)
HAC-Net: A Hybrid Attention-Based Convolutional Neural Network for Highly Accurate Protein-Ligand Binding Affinity Prediction [0.0]
We present a novel deep learning architecture consisting of a 3-dimensional convolutional neural network and two graph convolutional networks. HAC-Net obtains state-of-the-art results on the PDBbind v.2016 core set. We envision that this model can be extended to a broad range of supervised learning problems related to structure-based biomolecular property prediction.
arXiv Detail & Related papers (2022-12-23T16:14:53Z)
Integration of Pre-trained Protein Language Models into Geometric Deep Learning Networks [68.90692290665648]
We integrate knowledge learned by protein language models into several state-of-the-art geometric networks. Our findings show an overall improvement of 20% over baselines. Strong evidence indicates that the incorporation of protein language models' knowledge enhances geometric networks' capacity by a significant margin.
arXiv Detail & Related papers (2022-12-07T04:04:04Z)
Improved Convergence Guarantees for Shallow Neural Networks [91.3755431537592]
We prove convergence of depth 2 neural networks, trained via gradient descent, to a global minimum. Our model has the following features: regression with quadratic loss function, fully connected feedforward architecture, RelU activations, Gaussian data instances, adversarial labels. They strongly suggest that, at least in our model, the convergence phenomenon extends well beyond the NTK regime''
arXiv Detail & Related papers (2022-12-05T14:47:52Z)
Adapting the Mean Teacher for keypoint-based lung registration under geometric domain shifts [75.51482952586773]
deep neural networks generally require plenty of labeled training data and are vulnerable to domain shifts between training and test data. We present a novel approach to geometric domain adaptation for image registration, adapting a model from a labeled source to an unlabeled target domain. Our method consistently improves on the baseline model by 50%/47% while even matching the accuracy of models trained on target data.
arXiv Detail & Related papers (2022-07-01T12:16:42Z)
A new method for binary classification of proteins with Machine Learning [0.0]
In this work we set out to find a method to classify protein structures using a Deep Learning methodology. Our Artificial Intelligence has been trained to recognize complex biomolecule structures extrapolated from the Protein Data Bank (PDB) database and reprocessed as images. For this purpose various tests have been conducted with pre-trained Convolutional Neural Networks, such as InceptionResNetV2 or InceptionV3, in order to extract significant features from these images and correctly classify the molecule.
arXiv Detail & Related papers (2021-11-03T01:58:34Z)
Protein sequence-to-structure learning: Is this the end(-to-end revolution)? [0.8399688944263843]
In CASP14, deep learning has boosted the field to unanticipated levels reaching near-experimental accuracy. Novel emerging approaches include (i) geometric learning, i.e. learning on representations such as graphs, 3D Voronoi tessellations, and point clouds. We provide an overview and our opinion of the novel deep learning approaches developed in the last two years and widely used in CASP14.
arXiv Detail & Related papers (2021-05-16T10:46:44Z)
DeepFoldit -- A Deep Reinforcement Learning Neural Network Folding Proteins [0.0]
We trained a deep reinforcement neural network called DeepFoldit to improve the score assigned to an unfolded protein. Our approach combines the intuitive user interface of Foldit with the efficiency of deep reinforcement learning.
arXiv Detail & Related papers (2020-10-28T16:05:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.