A multitask transfer learning framework for the prediction of
virus-human protein-protein interactions
- URL: http://arxiv.org/abs/2111.13346v1
- Date: Fri, 26 Nov 2021 07:53:51 GMT
- Title: A multitask transfer learning framework for the prediction of
virus-human protein-protein interactions
- Authors: Thi Ngan Dong, Graham Brogden, Gisa Gerold, Megha Khosla
- Abstract summary: We develop a transfer learning approach that exploits the information of around 24 million protein sequences and the interaction patterns from the human interactome.
We employ an additional objective which aims to maximize the probability of observing human protein-protein interactions.
Experimental results show that our proposed model works effectively for both virus-human and bacteria-human protein-protein interaction prediction tasks.
- Score: 0.30586855806896046
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Viral infections are causing significant morbidity and mortality worldwide.
Understanding the interaction patterns between a particular virus and human
proteins plays a crucial role in unveiling the underlying mechanism of viral
infection and pathogenesis. This could further help in the prevention and
treatment of virus-related diseases. However, the task of predicting
protein-protein interactions between a new virus and human cells is extremely
challenging due to scarce data on virus-human interactions and fast mutation
rates of most viruses.
We developed a multitask transfer learning approach that exploits the
information of around 24 million protein sequences and the interaction patterns
from the human interactome to counter the problem of small training datasets.
Instead of using hand-crafted protein features, we utilize statistically rich
protein representations learned by a deep language modeling approach from a
massive source of protein sequences. Additionally, we employ an additional
objective which aims to maximize the probability of observing human
protein-protein interactions. This additional task objective acts as a
regularizer and also allows to incorporate domain knowledge to inform the
virus-human protein-protein interaction prediction model.
Our approach achieved competitive results on 13 benchmark datasets and the
case study for the SAR-CoV-2 virus receptor. Experimental results show that our
proposed model works effectively for both virus-human and bacteria-human
protein-protein interaction prediction tasks. We share our code for
reproducibility and future research at
https://git.l3s.uni-hannover.de/dong/multitask-transfer.
Related papers
- SFM-Protein: Integrative Co-evolutionary Pre-training for Advanced Protein Sequence Representation [97.99658944212675]
We introduce a novel pre-training strategy for protein foundation models.
It emphasizes the interactions among amino acid residues to enhance the extraction of both short-range and long-range co-evolutionary features.
Trained on a large-scale protein sequence dataset, our model demonstrates superior generalization ability.
arXiv Detail & Related papers (2024-10-31T15:22:03Z) - Learning to Predict Mutation Effects of Protein-Protein Interactions by Microenvironment-aware Hierarchical Prompt Learning [78.38442423223832]
We develop a novel codebook pre-training task, namely masked microenvironment modeling.
We demonstrate superior performance and training efficiency over state-of-the-art pre-training-based methods in mutation effect prediction.
arXiv Detail & Related papers (2024-05-16T03:53:21Z) - ProLLM: Protein Chain-of-Thoughts Enhanced LLM for Protein-Protein Interaction Prediction [54.132290875513405]
The prediction of protein-protein interactions (PPIs) is crucial for understanding biological functions and diseases.
Previous machine learning approaches to PPI prediction mainly focus on direct physical interactions.
We propose a novel framework ProLLM that employs an LLM tailored for PPI for the first time.
arXiv Detail & Related papers (2024-03-30T05:32:42Z) - Efficiently Predicting Protein Stability Changes Upon Single-point
Mutation with Large Language Models [51.57843608615827]
The ability to precisely predict protein thermostability is pivotal for various subfields and applications in biochemistry.
We introduce an ESM-assisted efficient approach that integrates protein sequence and structural features to predict the thermostability changes in protein upon single-point mutations.
arXiv Detail & Related papers (2023-12-07T03:25:49Z) - Improved K-mer Based Prediction of Protein-Protein Interactions With
Chaos Game Representation, Deep Learning and Reduced Representation Bias [0.0]
We present a method for extracting unique pairs from an interaction dataset, generating non-redundant paired data for unbiased machine learning.
We develop a convolutional neural network model capable of learning and predicting interactions from Chaos Game Representations of proteins' coding genes.
arXiv Detail & Related papers (2023-10-23T10:02:23Z) - Virus2Vec: Viral Sequence Classification Using Machine Learning [48.40285316053593]
We propose Virus2Vec, a feature-vector representation for viral sequences that enable machine learning models to identify viral hosts.
We empirically evaluate Virus2Vec on real-world spike sequences of Coronaviridae and rabies virus sequence data to predict the host.
Our results demonstrate that Virus2Vec outperforms the predictive accuracies of baseline and state-of-the-art methods.
arXiv Detail & Related papers (2023-04-24T08:17:16Z) - Multi-channel neural networks for predicting influenza A virus hosts and
antigenic types [3.1981440103815717]
A fast, accurate and low-cost method to predict the origin host and subtype of influenza viruses could help reduce virus transmission and benefit resource-poor areas.
We propose multi-channel neural networks to predict antigenic types and hosts of influenza A viruses with complete and partial protein sequences.
arXiv Detail & Related papers (2022-06-08T11:47:31Z) - Classification of Influenza Hemagglutinin Protein Sequences using
Convolutional Neural Networks [8.397189036839956]
This paper focuses on accurately predicting if an Influenza type A virus can infect specific hosts, and more specifically, Human, Avian and Swine hosts, using only the protein sequence of the HA gene.
We propose encoding the protein sequences into numerical signals using the Hydrophobicity Index and subsequently utilising a Convolutional Neural Network-based predictive model.
As the results show, the proposed model can distinguish HA protein sequences with high accuracy whenever the virus under investigation can infect Human, Avian or Swine hosts.
arXiv Detail & Related papers (2021-08-09T10:42:26Z) - Deep Contextual Learners for Protein Networks [16.599890339599586]
We introduce AWARE, a graph neural message passing approach to inject cellular and tissue context into protein embeddings.
AWARE learns protein, cell type, and tissue embeddings that uphold cell type and tissue hierarchies.
We demonstrate AWARE on the novel task of predicting whether a gene is associated with a disease and where it most likely manifests in the human body.
arXiv Detail & Related papers (2021-06-04T04:26:27Z) - Deep Learning of High-Order Interactions for Protein Interface
Prediction [58.164371994210406]
We propose to formulate the protein interface prediction as a 2D dense prediction problem.
We represent proteins as graphs and employ graph neural networks to learn node features.
We incorporate high-order pairwise interactions to generate a 3D tensor containing different pairwise interactions.
arXiv Detail & Related papers (2020-07-18T05:39:35Z) - Computational modeling of Human-nCoV protein-protein interaction network [17.875102234550305]
COVID-19 has created a global pandemic with high morbidity and mortality in 2020.
ICTV has declared that nCoV is highly genetically similar to SARS-CoV epidemic in 2003.
arXiv Detail & Related papers (2020-05-05T04:16:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.