Related papers: Interpretable Structured Learning with Sparse Gated Sequence Encoder for Protein-Protein Interaction Prediction

Interpretable Structured Learning with Sparse Gated Sequence Encoder for Protein-Protein Interaction Prediction

URL: http://arxiv.org/abs/2010.08514v1
Date: Fri, 16 Oct 2020 17:13:32 GMT
Title: Interpretable Structured Learning with Sparse Gated Sequence Encoder for Protein-Protein Interaction Prediction
Authors: Kishan KC, Feng Cui, Anne Haake, Rui Li
Abstract summary: Predicting protein-protein interactions (PPIs) by learning informative representations from amino acid sequences is a challenging yet important problem in biology. We present a novel deep framework to model and predict PPIs from sequence alone. Our model incorporates a bidirectional gated recurrent unit to learn sequence representations by leveraging contextualized and sequential information from sequences.
Score: 2.9488233765621295
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Predicting protein-protein interactions (PPIs) by learning informative representations from amino acid sequences is a challenging yet important problem in biology. Although various deep learning models in Siamese architecture have been proposed to model PPIs from sequences, these methods are computationally expensive for a large number of PPIs due to the pairwise encoding process. Furthermore, these methods are difficult to interpret because of non-intuitive mappings from protein sequences to their sequence representation. To address these challenges, we present a novel deep framework to model and predict PPIs from sequence alone. Our model incorporates a bidirectional gated recurrent unit to learn sequence representations by leveraging contextualized and sequential information from sequences. We further employ a sparse regularization to model long-range dependencies between amino acids and to select important amino acids (protein motifs), thus enhancing interpretability. Besides, the novel design of the encoding process makes our model computationally efficient and scalable to an increasing number of interactions. Experimental results on up-to-date interaction datasets demonstrate that our model achieves superior performance compared to other state-of-the-art methods. Literature-based case studies illustrate the ability of our model to provide biological insights to interpret the predictions.

Related papers

PRING: Rethinking Protein-Protein Interaction Prediction from Pairs to Graphs [80.08310253195144]
PRING is the first benchmark that evaluates protein-protein interaction prediction from a graph-level perspective.<n> PRING curates a high-quality, multi-species PPI network dataset comprising 21,484 proteins and 186,818 interactions.
arXiv Detail & Related papers (2025-07-07T15:21:05Z)
Life-Code: Central Dogma Modeling with Multi-Omics Sequence Unification [53.488387420073536]
Life-Code is a comprehensive framework that spans different biological functions. Life-Code achieves state-of-the-art performance on various tasks across three omics.
arXiv Detail & Related papers (2025-02-11T06:53:59Z)
GENERator: A Long-Context Generative Genomic Foundation Model [66.46537421135996]
We present GENERator, a generative genomic foundation model featuring a context length of 98k base pairs (bp) and 1.2B parameters. Trained on an expansive dataset comprising 386B bp of DNA, the GENERator demonstrates state-of-the-art performance across both established and newly proposed benchmarks. It also shows significant promise in sequence optimization, particularly through the prompt-responsive generation of enhancer sequences with specific activity profiles.
arXiv Detail & Related papers (2025-02-11T05:39:49Z)
Multi-Scale Representation Learning for Protein Fitness Prediction [31.735234482320283]
Previous methods have primarily relied on self-supervised models trained on vast, unlabeled protein sequence or structure datasets. We introduce the Sequence-Structure-Surface Fitness (S3F) model - a novel multimodal representation learning framework that integrates protein features across several scales. Our approach combines sequence representations from a protein language model with Geometric Vector Perceptron networks encoding protein backbone and detailed surface topology.
arXiv Detail & Related papers (2024-12-02T04:28:10Z)
SeqProFT: Applying LoRA Finetuning for Sequence-only Protein Property Predictions [8.112057136324431]
This study employs the LoRA method to perform end-to-end fine-tuning of the ESM-2 model. A multi-head attention mechanism is integrated into the downstream network to combine sequence features with contact map information.
arXiv Detail & Related papers (2024-11-18T12:40:39Z)
SFM-Protein: Integrative Co-evolutionary Pre-training for Advanced Protein Sequence Representation [97.99658944212675]
We introduce a novel pre-training strategy for protein foundation models. It emphasizes the interactions among amino acid residues to enhance the extraction of both short-range and long-range co-evolutionary features. Trained on a large-scale protein sequence dataset, our model demonstrates superior generalization ability.
arXiv Detail & Related papers (2024-10-31T15:22:03Z)
Semantically Rich Local Dataset Generation for Explainable AI in Genomics [0.716879432974126]
Black box deep learning models trained on genomic sequences excel at predicting the outcomes of different gene regulatory mechanisms. We propose using Genetic Programming to generate datasets by evolving perturbations in sequences that contribute to their semantic diversity.
arXiv Detail & Related papers (2024-07-03T10:31:30Z)
Co-modeling the Sequential and Graphical Routes for Peptide Representation Learning [67.66393016797181]
We propose a peptide co-modeling method, RepCon, to enhance the mutual information of representations from decoupled sequential and graphical end-to-end models. RepCon learns to enhance the consistency of representations between positive sample pairs and to repel representations between negative pairs. Our results demonstrate the superiority of the co-modeling approach over independent modeling, as well as the superiority of RepCon over other methods under the co-modeling framework.
arXiv Detail & Related papers (2023-10-04T16:58:25Z)
Efficient Prediction of Peptide Self-assembly through Sequential and Graphical Encoding [57.89530563948755]
This work provides a benchmark analysis of peptide encoding with advanced deep learning models. It serves as a guide for a wide range of peptide-related predictions such as isoelectric points, hydration free energy, etc.
arXiv Detail & Related papers (2023-07-17T00:43:33Z)
PIGNet2: A Versatile Deep Learning-based Protein-Ligand Interaction Prediction Model for Binding Affinity Scoring and Virtual Screening [0.0]
Prediction of protein-ligand interactions (PLI) plays a crucial role in drug discovery. The development of a versatile model capable of accurately scoring binding affinity and conducting efficient virtual screening remains a challenge. Here, we propose a viable solution by introducing a novel data augmentation strategy combined with a physics-informed graph neural network.
arXiv Detail & Related papers (2023-07-03T14:46:49Z)
Predicting protein variants with equivariant graph neural networks [0.0]
We compare the abilities of equivariant graph neural networks (EGNNs) and sequence-based approaches to identify promising amino-acid mutations. Our proposed structural approach achieves a competitive performance to sequence-based approaches while being trained on significantly fewer molecules.
arXiv Detail & Related papers (2023-06-21T12:44:52Z)
Pre-training Co-evolutionary Protein Representation via A Pairwise Masked Language Model [93.9943278892735]
Key problem in protein sequence representation learning is to capture the co-evolutionary information reflected by the inter-residue co-variation in the sequences. We propose a novel method to capture this information directly by pre-training via a dedicated language model, i.e., Pairwise Masked Language Model (PMLM) Our result shows that the proposed method can effectively capture the interresidue correlations and improves the performance of contact prediction by up to 9% compared to the baseline.
arXiv Detail & Related papers (2021-10-29T04:01:32Z)
Align-gram : Rethinking the Skip-gram Model for Protein Sequence Analysis [0.8733639720576208]
We propose a novel embedding scheme, Align-gram, which is capable of mapping the similar $k$-mers close to each other in a vector space. Our experiments with a simple baseline LSTM model and a much complex CNN model of DeepGoPlus shows the potential of Align-gram in performing different types of deep learning applications for protein sequence analysis.
arXiv Detail & Related papers (2020-12-06T17:04:17Z)
Deep Learning of High-Order Interactions for Protein Interface Prediction [58.164371994210406]
We propose to formulate the protein interface prediction as a 2D dense prediction problem. We represent proteins as graphs and employ graph neural networks to learn node features. We incorporate high-order pairwise interactions to generate a 3D tensor containing different pairwise interactions.
arXiv Detail & Related papers (2020-07-18T05:39:35Z)
A Trainable Optimal Transport Embedding for Feature Aggregation and its Relationship to Attention [96.77554122595578]
We introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference. Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost.
arXiv Detail & Related papers (2020-06-22T08:35:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.