Co-modeling the Sequential and Graphical Routes for Peptide
Representation Learning
- URL: http://arxiv.org/abs/2310.02964v2
- Date: Thu, 5 Oct 2023 12:42:25 GMT
- Title: Co-modeling the Sequential and Graphical Routes for Peptide
Representation Learning
- Authors: Zihan Liu, Ge Wang, Jiaqi Wang, Jiangbin Zheng, Stan Z. Li
- Abstract summary: We propose a peptide co-modeling method, RepCon, to enhance the mutual information of representations from decoupled sequential and graphical end-to-end models.
RepCon learns to enhance the consistency of representations between positive sample pairs and to repel representations between negative pairs.
Our results demonstrate the superiority of the co-modeling approach over independent modeling, as well as the superiority of RepCon over other methods under the co-modeling framework.
- Score: 67.66393016797181
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Peptides are formed by the dehydration condensation of multiple amino acids.
The primary structure of a peptide can be represented either as an amino acid
sequence or as a molecular graph consisting of atoms and chemical bonds.
Previous studies have indicated that deep learning routes specific to
sequential and graphical peptide forms exhibit comparable performance on
downstream tasks. Despite the fact that these models learn representations of
the same modality of peptides, we find that they explain their predictions
differently. Considering sequential and graphical models as two experts making
inferences from different perspectives, we work on fusing expert knowledge to
enrich the learned representations for improving the discriminative
performance. To achieve this, we propose a peptide co-modeling method, RepCon,
which employs a contrastive learning-based framework to enhance the mutual
information of representations from decoupled sequential and graphical
end-to-end models. It considers representations from the sequential encoder and
the graphical encoder for the same peptide sample as a positive pair and learns
to enhance the consistency of representations between positive sample pairs and
to repel representations between negative pairs. Empirical studies of RepCon
and other co-modeling methods are conducted on open-source discriminative
datasets, including aggregation propensity, retention time, antimicrobial
peptide prediction, and family classification from Peptide Database. Our
results demonstrate the superiority of the co-modeling approach over
independent modeling, as well as the superiority of RepCon over other methods
under the co-modeling framework. In addition, the attribution on RepCon further
corroborates the validity of the approach at the level of model explanation.
Related papers
- Optimizing OOD Detection in Molecular Graphs: A Novel Approach with Diffusion Models [71.39421638547164]
We propose to detect OOD molecules by adopting an auxiliary diffusion model-based framework, which compares similarities between input molecules and reconstructed graphs.
Due to the generative bias towards reconstructing ID training samples, the similarity scores of OOD molecules will be much lower to facilitate detection.
Our research pioneers an approach of Prototypical Graph Reconstruction for Molecular OOD Detection, dubbed as PGR-MOOD and hinges on three innovations.
arXiv Detail & Related papers (2024-04-24T03:25:53Z) - Revealing Multimodal Contrastive Representation Learning through Latent
Partial Causal Models [85.67870425656368]
We introduce a unified causal model specifically designed for multimodal data.
We show that multimodal contrastive representation learning excels at identifying latent coupled variables.
Experiments demonstrate the robustness of our findings, even when the assumptions are violated.
arXiv Detail & Related papers (2024-02-09T07:18:06Z) - Improved prediction of ligand-protein binding affinities by meta-modeling [1.3859669037499769]
We develop a framework to integrate published force-field-based empirical docking and sequence-based deep learning models.
We show that many of our meta-models significantly improve affinity predictions over base models.
Our best meta-models achieve comparable performance to state-of-the-art deep learning tools exclusively based on 3D structures.
arXiv Detail & Related papers (2023-10-05T23:46:45Z) - Representer Point Selection for Explaining Regularized High-dimensional
Models [105.75758452952357]
We introduce a class of sample-based explanations we term high-dimensional representers.
Our workhorse is a novel representer theorem for general regularized high-dimensional models.
We study the empirical performance of our proposed methods on three real-world binary classification datasets and two recommender system datasets.
arXiv Detail & Related papers (2023-05-31T16:23:58Z) - Molecular Property Prediction by Semantic-invariant Contrastive Learning [26.19431931932982]
We develop a Fragment-based Semantic-Invariant Contrastive Learning model based on this view generation method for molecular property prediction.
With the least number of pre-training samples, FraSICL can achieve state-of-the-art performance, compared with major existing counterpart models.
arXiv Detail & Related papers (2023-03-13T07:32:37Z) - Can Pre-trained Models Really Learn Better Molecular Representations for
AI-aided Drug Discovery? [22.921555120408907]
We propose a method named Representation-Property Relationship Analysis (RePRA) to evaluate the quality of representations extracted by the pre-trained model.
Two scores are designed to measure the generalized ACs and SH detected by RePRA.
In experiments, representations of molecules from 10 target tasks generated by 7 pre-trained models are analyzed.
arXiv Detail & Related papers (2022-08-21T10:05:25Z) - Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [54.94763543386523]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors.
We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method.
Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z) - Interpretable Structured Learning with Sparse Gated Sequence Encoder for
Protein-Protein Interaction Prediction [2.9488233765621295]
Predicting protein-protein interactions (PPIs) by learning informative representations from amino acid sequences is a challenging yet important problem in biology.
We present a novel deep framework to model and predict PPIs from sequence alone.
Our model incorporates a bidirectional gated recurrent unit to learn sequence representations by leveraging contextualized and sequential information from sequences.
arXiv Detail & Related papers (2020-10-16T17:13:32Z) - Energy-based View of Retrosynthesis [70.66156081030766]
We propose a framework that unifies sequence- and graph-based methods as energy-based models.
We present a novel dual variant within the framework that performs consistent training over Bayesian forward- and backward-prediction.
This model improves state-of-the-art performance by 9.6% for template-free approaches where the reaction type is unknown.
arXiv Detail & Related papers (2020-07-14T18:51:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.