Improved K-mer Based Prediction of Protein-Protein Interactions With
Chaos Game Representation, Deep Learning and Reduced Representation Bias
- URL: http://arxiv.org/abs/2310.14764v1
- Date: Mon, 23 Oct 2023 10:02:23 GMT
- Title: Improved K-mer Based Prediction of Protein-Protein Interactions With
Chaos Game Representation, Deep Learning and Reduced Representation Bias
- Authors: Ruth Veevers and Dan MacLean
- Abstract summary: We present a method for extracting unique pairs from an interaction dataset, generating non-redundant paired data for unbiased machine learning.
We develop a convolutional neural network model capable of learning and predicting interactions from Chaos Game Representations of proteins' coding genes.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Protein-protein interactions drive many biological processes, including the
detection of phytopathogens by plants' R-Proteins and cell surface receptors.
Many machine learning studies have attempted to predict protein-protein
interactions but performance is highly dependent on training data; models have
been shown to accurately predict interactions when the proteins involved are
included in the training data, but achieve consistently poorer results when
applied to previously unseen proteins. In addition, models that are trained
using proteins that take part in multiple interactions can suffer from
representation bias, where predictions are driven not by learned biological
features but by learning of the structure of the interaction dataset.
We present a method for extracting unique pairs from an interaction dataset,
generating non-redundant paired data for unbiased machine learning. After
applying the method to datasets containing _Arabidopsis thaliana_ and pathogen
effector interations, we developed a convolutional neural network model capable
of learning and predicting interactions from Chaos Game Representations of
proteins' coding genes.
Related papers
- SFM-Protein: Integrative Co-evolutionary Pre-training for Advanced Protein Sequence Representation [97.99658944212675]
We introduce a novel pre-training strategy for protein foundation models.
It emphasizes the interactions among amino acid residues to enhance the extraction of both short-range and long-range co-evolutionary features.
Trained on a large-scale protein sequence dataset, our model demonstrates superior generalization ability.
arXiv Detail & Related papers (2024-10-31T15:22:03Z) - Long-context Protein Language Model [76.95505296417866]
Self-supervised training of language models (LMs) has seen great success for protein sequences in learning meaningful representations and for generative drug design.
Most protein LMs are based on the Transformer architecture trained on individual proteins with short context lengths.
We propose LC-PLM based on an alternative protein LM architecture, BiMamba-S, built off selective structured state-space models.
We also introduce its graph-contextual variant, LC-PLM-G, which contextualizes protein-protein interaction graphs for a second stage of training.
arXiv Detail & Related papers (2024-10-29T16:43:28Z) - Protein binding affinity prediction under multiple substitutions applying eGNNs on Residue and Atomic graphs combined with Language model information: eGRAL [1.840390797252648]
Deep learning is increasingly recognized as a powerful tool capable of bridging the gap between in-silico predictions and in-vitro observations.
We propose eGRAL, a novel graph neural network architecture designed for predicting binding affinity changes from amino acid substitutions in protein complexes.
eGRAL leverages residue, atomic and evolutionary scales, thanks to features extracted from protein large language models.
arXiv Detail & Related papers (2024-05-03T10:33:19Z) - ProLLM: Protein Chain-of-Thoughts Enhanced LLM for Protein-Protein Interaction Prediction [54.132290875513405]
The prediction of protein-protein interactions (PPIs) is crucial for understanding biological functions and diseases.
Previous machine learning approaches to PPI prediction mainly focus on direct physical interactions.
We propose a novel framework ProLLM that employs an LLM tailored for PPI for the first time.
arXiv Detail & Related papers (2024-03-30T05:32:42Z) - Protein-ligand binding representation learning from fine-grained
interactions [29.965890962846093]
We propose to learn protein-ligand binding representation in a self-supervised learning manner.
This self-supervised learning problem is formulated as a prediction of the conclusive binding complex structure.
Experiments have demonstrated the superiority of our method across various binding tasks.
arXiv Detail & Related papers (2023-11-09T01:33:09Z) - Growing ecosystem of deep learning methods for modeling
protein$\unicode{x2013}$protein interactions [0.0]
We discuss the growing ecosystem of deep learning methods for modeling protein interactions.
Opportunities abound to discover novel interactions, modulate their physical mechanisms, and engineer binders to unravel their functions.
arXiv Detail & Related papers (2023-10-10T15:53:27Z) - State-specific protein-ligand complex structure prediction with a
multi-scale deep generative model [68.28309982199902]
We present NeuralPLexer, a computational approach that can directly predict protein-ligand complex structures.
Our study suggests that a data-driven approach can capture the structural cooperativity between proteins and small molecules, showing promise in accelerating the design of enzymes, drug molecules, and beyond.
arXiv Detail & Related papers (2022-09-30T01:46:38Z) - Learning Geometrically Disentangled Representations of Protein Folding
Simulations [72.03095377508856]
This work focuses on learning a generative neural network on a structural ensemble of a drug-target protein.
Model tasks involve characterizing the distinct structural fluctuations of the protein bound to various drug molecules.
Results show that our geometric learning-based method enjoys both accuracy and efficiency for generating complex structural variations.
arXiv Detail & Related papers (2022-05-20T19:38:00Z) - A Supervised Machine Learning Approach for Sequence Based
Protein-protein Interaction (PPI) Prediction [4.916874464940376]
Computational protein-protein interaction (PPI) prediction techniques can contribute greatly in reducing time, cost and false-positive interactions.
We have described our submitted solution with the results of the SeqPIP competition.
arXiv Detail & Related papers (2022-03-23T18:27:25Z) - Learning Unknown from Correlations: Graph Neural Network for
Inter-novel-protein Interaction Prediction [7.860159889216291]
Existing methods suffer from significant performance degradation when tested in unseen dataset.
We propose a graph neural network based method (GNN-PPI) for better inter-novel-protein interaction prediction.
arXiv Detail & Related papers (2021-05-14T08:42:55Z) - Deep Learning of High-Order Interactions for Protein Interface
Prediction [58.164371994210406]
We propose to formulate the protein interface prediction as a 2D dense prediction problem.
We represent proteins as graphs and employ graph neural networks to learn node features.
We incorporate high-order pairwise interactions to generate a 3D tensor containing different pairwise interactions.
arXiv Detail & Related papers (2020-07-18T05:39:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.