Bayesian neural network with pretrained protein embedding enhances
prediction accuracy of drug-protein interaction
- URL: http://arxiv.org/abs/2012.08194v2
- Date: Mon, 21 Dec 2020 14:47:48 GMT
- Title: Bayesian neural network with pretrained protein embedding enhances
prediction accuracy of drug-protein interaction
- Authors: QHwan Kim, Joon-Hyuk Ko, Sunghoon Kim, Nojun Park, Wonho Jhe
- Abstract summary: Deep learning approaches can predict drug-protein interactions without trial-and-error by humans.
We propose two methods to construct a deep learning framework that exhibits superior performance with a small labeled dataset.
- Score: 3.499870393443268
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The characterization of drug-protein interactions is crucial in the
high-throughput screening for drug discovery. The deep learning-based
approaches have attracted attention because they can predict drug-protein
interactions without trial-and-error by humans. However, because data labeling
requires significant resources, the available protein data size is relatively
small, which consequently decreases model performance. Here we propose two
methods to construct a deep learning framework that exhibits superior
performance with a small labeled dataset. At first, we use transfer learning in
encoding protein sequences with a pretrained model, which trains general
sequence representations in an unsupervised manner. Second, we use a Bayesian
neural network to make a robust model by estimating the data uncertainty. As a
result, our model performs better than the previous baselines for predicting
drug-protein interactions. We also show that the quantified uncertainty from
the Bayesian inference is related to the confidence and can be used for
screening DPI data points.
Related papers
- ProtIR: Iterative Refinement between Retrievers and Predictors for
Protein Function Annotation [38.019425619750265]
We introduce a novel variational pseudo-likelihood framework, ProtIR, designed to improve function predictors by incorporating inter-protein similarity modeling.
ProtIR showcases around 10% improvement over vanilla predictor-based methods.
It achieves performance on par with protein language model-based methods, yet without the need for massive pre-training.
arXiv Detail & Related papers (2024-02-10T17:31:46Z) - Domain Adaptive Synapse Detection with Weak Point Annotations [63.97144211520869]
We present AdaSyn, a framework for domain adaptive synapse detection with weak point annotations.
In the WASPSYN challenge at I SBI 2023, our method ranks the 1st place.
arXiv Detail & Related papers (2023-08-31T05:05:53Z) - DeepGATGO: A Hierarchical Pretraining-Based Graph-Attention Model for
Automatic Protein Function Prediction [4.608328575930055]
Automatic protein function prediction (AFP) is classified as a large-scale multi-label classification problem.
Currently, popular methods primarily combine protein-related information and Gene Ontology (GO) terms to generate final functional predictions.
We propose a sequence-based hierarchical prediction method, DeepGATGO, which processes protein sequences and GO term labels hierarchically.
arXiv Detail & Related papers (2023-07-24T07:01:32Z) - TWINS: A Fine-Tuning Framework for Improved Transferability of
Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks.
We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework.
TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z) - Drug Synergistic Combinations Predictions via Large-Scale Pre-Training
and Graph Structure Learning [82.93806087715507]
Drug combination therapy is a well-established strategy for disease treatment with better effectiveness and less safety degradation.
Deep learning models have emerged as an efficient way to discover synergistic combinations.
Our framework achieves state-of-the-art results in comparison with other deep learning-based methods.
arXiv Detail & Related papers (2023-01-14T15:07:43Z) - Reprogramming Pretrained Language Models for Protein Sequence
Representation Learning [68.75392232599654]
We propose Representation Learning via Dictionary Learning (R2DL), an end-to-end representation learning framework.
R2DL reprograms a pretrained English language model to learn the embeddings of protein sequences.
Our model can attain better accuracy and significantly improve the data efficiency by up to $105$ times over the baselines set by pretrained and standard supervised methods.
arXiv Detail & Related papers (2023-01-05T15:55:18Z) - A Supervised Machine Learning Approach for Sequence Based
Protein-protein Interaction (PPI) Prediction [4.916874464940376]
Computational protein-protein interaction (PPI) prediction techniques can contribute greatly in reducing time, cost and false-positive interactions.
We have described our submitted solution with the results of the SeqPIP competition.
arXiv Detail & Related papers (2022-03-23T18:27:25Z) - Transformers Can Do Bayesian Inference [56.99390658880008]
We present Prior-Data Fitted Networks (PFNs)
PFNs leverage in-context learning in large-scale machine learning techniques to approximate a large set of posteriors.
We demonstrate that PFNs can near-perfectly mimic Gaussian processes and also enable efficient Bayesian inference for intractable problems.
arXiv Detail & Related papers (2021-12-20T13:07:39Z) - A new framework for experimental design using Bayesian Evidential
Learning: the case of wellhead protection area [0.0]
We predict the wellhead protection area (WHPA), the shape and extent of which is influenced by the distribution of hydraulic conductivity (K), from a small number of tracing experiments (predictors)
Our first objective is to make predictions of the WHPA within the Bayesian Evidential Learning framework, which aims to find a direct relationship between predictor and target using machine learning.
Our second objective is to extend BEL to identify the optimal design of data source locations that minimizes the posterior uncertainty of the WHPA.
arXiv Detail & Related papers (2021-05-12T09:40:28Z) - Improving Uncertainty Calibration via Prior Augmented Data [56.88185136509654]
Neural networks have proven successful at learning from complex data distributions by acting as universal function approximators.
They are often overconfident in their predictions, which leads to inaccurate and miscalibrated probabilistic predictions.
We propose a solution by seeking out regions of feature space where the model is unjustifiably overconfident, and conditionally raising the entropy of those predictions towards that of the prior distribution of the labels.
arXiv Detail & Related papers (2021-02-22T07:02:37Z) - GEFA: Early Fusion Approach in Drug-Target Affinity Prediction [28.695523040015164]
We propose a novel graph-in-graph neural network with attention mechanism to address the changes in target representation because of the binding effects.
A drug is modeled as a graph of atoms, which then serves as a node in a larger graph of residues-drug complex.
We also use pre-trained protein representation powered by the recent effort of learning contextualized protein representation.
arXiv Detail & Related papers (2020-09-25T11:54:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.