SGAC: A Graph Neural Network Framework for Imbalanced and Structure-Aware AMP Classification
- URL: http://arxiv.org/abs/2412.16276v1
- Date: Fri, 20 Dec 2024 17:17:57 GMT
- Title: SGAC: A Graph Neural Network Framework for Imbalanced and Structure-Aware AMP Classification
- Authors: Yingxu Wang, Victor Liang, Nan Yin, Siwei Liu, Eran Segal,
- Abstract summary: Classifying antimicrobial peptides(AMPs) from the vast array of peptides mined from metagenomic sequencing data is a significant approach to addressing the issue of antibiotic resistance.
Current AMP classification methods, primarily relying on sequence-based data, neglect the spatial structure of peptides, thereby limiting the accurate classification of AMPs.
- Score: 7.044114650607729
- License:
- Abstract: Classifying antimicrobial peptides(AMPs) from the vast array of peptides mined from metagenomic sequencing data is a significant approach to addressing the issue of antibiotic resistance. However, current AMP classification methods, primarily relying on sequence-based data, neglect the spatial structure of peptides, thereby limiting the accurate classification of AMPs. Additionally, the number of known AMPs is significantly lower than that of non-AMPs, leading to imbalanced datasets that reduce predictive accuracy for AMPs. To alleviate these two limitations, we first employ Omegafold to predict the three-dimensional spatial structures of AMPs and non-AMPs, constructing peptide graphs based on the amino acids' C$_\alpha$ positions. Building upon this, we propose a novel classification model named Spatial GNN-based AMP Classifier (SGAC). Our SGAC model employs a graph encoder based on Graph Neural Networks (GNNs) to process peptide graphs, generating high-dimensional representations that capture essential features from the three-dimensional spatial structure of amino acids. Then, to address the inherent imbalanced datasets, SGAC first incorporates Weight-enhanced Contrastive Learning, which clusters similar peptides while ensuring separation between dissimilar ones, using weighted contributions to emphasize AMP-specific features. Furthermore, SGAC employs Weight-enhanced Pseudo-label Distillation to dynamically generate high-confidence pseudo labels for ambiguous peptides, further refining predictions and promoting balanced learning between AMPs and non-AMPs. Experiments on publicly available AMP and non-AMP datasets demonstrate that SGAC significantly outperforms traditional sequence-based methods and achieves state-of-the-art performance among graph-based models, validating its effectiveness in AMP classification.
Related papers
- Unrolled denoising networks provably learn optimal Bayesian inference [54.79172096306631]
We prove the first rigorous learning guarantees for neural networks based on unrolling approximate message passing (AMP)
For compressed sensing, we prove that when trained on data drawn from a product prior, the layers of the network converge to the same denoisers used in Bayes AMP.
arXiv Detail & Related papers (2024-09-19T17:56:16Z) - Regressor-free Molecule Generation to Support Drug Response Prediction [83.25894107956735]
Conditional generation based on the target IC50 score can obtain a more effective sampling space.
Regressor-free guidance combines a diffusion model's score estimation with a regression controller model's gradient based on number labels.
arXiv Detail & Related papers (2024-05-23T13:22:17Z) - HMAMP: Hypervolume-Driven Multi-Objective Antimicrobial Peptides Design [11.891046340221735]
This paper introduces a paradigm shift by considering multiple attributes in Antimicrobial peptides (AMPs) design.
By synergizing reinforcement learning and a descent algorithm rooted in the hypervolume of AMP concept, HMAMP effectively expands exploration space and mitigates the issue of pattern collapse.
A detailed analysis of the helical structures and molecular dynamics simulations for ten potential candidate AMPs validates the superiority of HMAMP in the realm of multi-objective AMP design.
arXiv Detail & Related papers (2024-05-01T07:17:59Z) - Exploring Beyond Logits: Hierarchical Dynamic Labeling Based on Embeddings for Semi-Supervised Classification [49.09505771145326]
We propose a Hierarchical Dynamic Labeling (HDL) algorithm that does not depend on model predictions and utilizes image embeddings to generate sample labels.
Our approach has the potential to change the paradigm of pseudo-label generation in semi-supervised learning.
arXiv Detail & Related papers (2024-04-26T06:00:27Z) - AMPCliff: quantitative definition and benchmarking of activity cliffs in antimicrobial peptides [4.826446796830595]
This study introduces a quantitative definition and benchmarking framework AMPCliff for the AC phenomenon in antimicrobial peptides (AMPs) composed by canonical amino acids.
AMPCliff quantifies the activities of AMPs by the MIC, and defines 0.9 as the minimum threshold for the normalized BLOSUM62 similarity score between a pair of aligned peptides with at least two-fold MIC changes.
Our analysis reveals that these models are capable of detecting AMP AC events and the pre-trained protein language model ESM2 demonstrates superior performance across the evaluations.
arXiv Detail & Related papers (2024-04-15T12:40:12Z) - Convolutional Monge Mapping Normalization for learning on sleep data [63.22081662149488]
We propose a new method called Convolutional Monge Mapping Normalization (CMMN)
CMMN consists in filtering the signals in order to adapt their power spectrum density (PSD) to a Wasserstein barycenter estimated on training data.
Numerical experiments on sleep EEG data show that CMMN leads to significant and consistent performance gains independent from the neural network architecture.
arXiv Detail & Related papers (2023-05-30T08:24:01Z) - A Non-Asymptotic Framework for Approximate Message Passing in Spiked
Models [24.786030482013437]
Approximate message passing (AMP) emerges as an effective iterative paradigm for solving high-dimensional statistical problems.
Prior AMP theory fell short of predicting the AMP dynamics when the number of iterations surpasses $obig(fraclog nloglog nbig)$.
This paper develops a non-asymptotic framework for understanding AMP in spiked matrix estimation.
arXiv Detail & Related papers (2022-08-05T17:59:06Z) - Hierarchical Semi-Supervised Contrastive Learning for
Contamination-Resistant Anomaly Detection [81.07346419422605]
Anomaly detection aims at identifying deviant samples from the normal data distribution.
Contrastive learning has provided a successful way to sample representation that enables effective discrimination on anomalies.
We propose a novel hierarchical semi-supervised contrastive learning framework, for contamination-resistant anomaly detection.
arXiv Detail & Related papers (2022-07-24T18:49:26Z) - Generalizing electrocardiogram delineation: training convolutional
neural networks with synthetic data augmentation [63.51064808536065]
Existing databases for ECG delineation are small, being insufficient in size and in the array of pathological conditions they represent.
This article delves has two main contributions. First, a pseudo-synthetic data generation algorithm was developed, based in probabilistically composing ECG traces given "pools" of fundamental segments, as cropped from the original databases, and a set of rules for their arrangement into coherent synthetic traces.
Second, two novel segmentation-based loss functions have been developed, which attempt at enforcing the prediction of an exact number of independent structures and at producing closer segmentation boundaries by focusing on a reduced number of samples.
arXiv Detail & Related papers (2021-11-25T10:11:41Z) - HMD-AMP: Protein Language-Powered Hierarchical Multi-label Deep Forest
for Annotating Antimicrobial Peptides [5.61222966894307]
We build a diverse and comprehensive multi-label protein sequence database by collecting and cleaning amino acids from various AMP databases.
We develop an end-to-end hierarchical multi-label deep forest framework, HMD-AMP, to annotate AMP comprehensively.
After identifying an AMP, it further predicts what targets the AMP can effectively kill from eleven available classes.
arXiv Detail & Related papers (2021-11-11T02:10:07Z) - PCA Initialization for Approximate Message Passing in Rotationally
Invariant Models [29.039655256171088]
Principal Component Analysis provides a natural estimator, and sharp results on its performance have been obtained in the high-dimensional regime.
Recently, an Approximate Message Passing (AMP) algorithm has been proposed as an alternative estimator with the potential to improve the accuracy of PCA.
In this work, we combine the two methods, initialize AMP with PCA, and propose a rigorous characterization of the performance of this estimator.
arXiv Detail & Related papers (2021-06-04T09:13:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.