AMPCliff: quantitative definition and benchmarking of activity cliffs in antimicrobial peptides
- URL: http://arxiv.org/abs/2404.09738v2
- Date: Sun, 03 Nov 2024 11:59:50 GMT
- Title: AMPCliff: quantitative definition and benchmarking of activity cliffs in antimicrobial peptides
- Authors: Kewei Li, Yuqian Wu, Yinheng Li, Yutong Guo, Yan Wang, Yiyang Liang, Yusi Fan, Lan Huang, Ruochi Zhang, Fengfeng Zhou,
- Abstract summary: This study introduces a quantitative definition and benchmarking framework AMPCliff for the AC phenomenon in antimicrobial peptides (AMPs) composed by canonical amino acids.
AMPCliff quantifies the activities of AMPs by the MIC, and defines 0.9 as the minimum threshold for the normalized BLOSUM62 similarity score between a pair of aligned peptides with at least two-fold MIC changes.
Our analysis reveals that these models are capable of detecting AMP AC events and the pre-trained protein language model ESM2 demonstrates superior performance across the evaluations.
- Score: 4.826446796830595
- License:
- Abstract: Since the mechanism of action of drug molecules in the human body is difficult to reproduce in the in vitro environment, it becomes difficult to reveal the causes of the activity cliff phenomenon of drug molecules. We found out the AC of small molecules has been extensively investigated but limited knowledge is accumulated about the AC phenomenon in peptides with canonical amino acids. Understanding the mechanism of AC in canonical amino acids might help understand the one in drug molecules. This study introduces a quantitative definition and benchmarking framework AMPCliff for the AC phenomenon in antimicrobial peptides (AMPs) composed by canonical amino acids. A comprehensive analysis of the existing AMP dataset reveals a significant prevalence of AC within AMPs. AMPCliff quantifies the activities of AMPs by the MIC, and defines 0.9 as the minimum threshold for the normalized BLOSUM62 similarity score between a pair of aligned peptides with at least two-fold MIC changes. This study establishes a benchmark dataset of paired AMPs in Staphylococcus aureus from the publicly available AMP dataset GRAMPA, and conducts a rigorous procedure to evaluate various AMP AC prediction models, including nine machine learning, four deep learning algorithms, four masked language models, and four generative language models. Our analysis reveals that these models are capable of detecting AMP AC events and the pre-trained protein language model ESM2 demonstrates superior performance across the evaluations. The predictive performance of AMP activity cliffs remains to be further improved, considering that ESM2 with 33 layers only achieves the Spearman correlation coefficient 0.4669 for the regression task of the MIC values on the benchmark dataset. Source code and additional resources are available at https://www.healthinformaticslab.org/supp/ or https://github.com/Kewei2023/AMPCliff-generation.
Related papers
- Regressor-free Molecule Generation to Support Drug Response Prediction [83.25894107956735]
Conditional generation based on the target IC50 score can obtain a more effective sampling space.
Regressor-free guidance combines a diffusion model's score estimation with a regression controller model's gradient based on number labels.
arXiv Detail & Related papers (2024-05-23T13:22:17Z) - HMAMP: Hypervolume-Driven Multi-Objective Antimicrobial Peptides Design [11.891046340221735]
This paper introduces a paradigm shift by considering multiple attributes in Antimicrobial peptides (AMPs) design.
By synergizing reinforcement learning and a descent algorithm rooted in the hypervolume of AMP concept, HMAMP effectively expands exploration space and mitigates the issue of pattern collapse.
A detailed analysis of the helical structures and molecular dynamics simulations for ten potential candidate AMPs validates the superiority of HMAMP in the realm of multi-objective AMP design.
arXiv Detail & Related papers (2024-05-01T07:17:59Z) - Objective-Agnostic Enhancement of Molecule Properties via Multi-Stage
VAE [1.3597551064547502]
Variational autoencoder (VAE) is a popular method for drug discovery and various architectures and pipelines have been proposed to improve its performance.
VAE approaches are known to suffer from poor manifold recovery when the data lie on a low-dimensional manifold embedded in a higher dimensional ambient space.
In this paper, we explore applying a multi-stage VAE approach, that can improve manifold recovery on a synthetic dataset, to the field of drug discovery.
arXiv Detail & Related papers (2023-08-24T20:22:22Z) - Drug Synergistic Combinations Predictions via Large-Scale Pre-Training
and Graph Structure Learning [82.93806087715507]
Drug combination therapy is a well-established strategy for disease treatment with better effectiveness and less safety degradation.
Deep learning models have emerged as an efficient way to discover synergistic combinations.
Our framework achieves state-of-the-art results in comparison with other deep learning-based methods.
arXiv Detail & Related papers (2023-01-14T15:07:43Z) - Accelerating Antimicrobial Peptide Discovery with Latent Structure [33.288514128470425]
We propose a latent sequence-structure model for designing AMPs (LSSAMP)
LSSAMP exploits multi-scale vector quantization in the latent space to represent secondary structures.
Experimental results show that the peptides generated by LSSAMP have a high probability of antimicrobial activity.
arXiv Detail & Related papers (2022-11-28T06:43:32Z) - Graph-Based Active Machine Learning Method for Diverse and Novel
Antimicrobial Peptides Generation and Selection [57.131117785001194]
Large-scale screening of new AMP candidates is expensive, time-consuming, and now affordable in developing countries.
We propose a novel active machine learning-based framework that statistically minimizes the number of wet-lab experiments needed to design new AMPs.
arXiv Detail & Related papers (2022-09-18T14:30:48Z) - Robust Quantitative Susceptibility Mapping via Approximate Message
Passing with Parameter Estimation [14.22930572798757]
We propose a probabilistic Bayesian approach for quantitative susceptibility mapping (QSM) with built-in parameter estimation.
On the simulated Sim2Snr1 dataset, AMP-PE achieved the lowest NRMSE, DFCM and the highest SSIM.
On the in vivo datasets, AMP-PE is robust and successfully recovers the susceptibility maps using the estimated parameters.
arXiv Detail & Related papers (2022-07-29T14:38:03Z) - Hierarchical Semi-Supervised Contrastive Learning for
Contamination-Resistant Anomaly Detection [81.07346419422605]
Anomaly detection aims at identifying deviant samples from the normal data distribution.
Contrastive learning has provided a successful way to sample representation that enables effective discrimination on anomalies.
We propose a novel hierarchical semi-supervised contrastive learning framework, for contamination-resistant anomaly detection.
arXiv Detail & Related papers (2022-07-24T18:49:26Z) - Benchmarking Machine Learning Robustness in Covid-19 Genome Sequence
Classification [109.81283748940696]
We introduce several ways to perturb SARS-CoV-2 genome sequences to mimic the error profiles of common sequencing platforms such as Illumina and PacBio.
We show that some simulation-based approaches are more robust (and accurate) than others for specific embedding methods to certain adversarial attacks to the input sequences.
arXiv Detail & Related papers (2022-07-18T19:16:56Z) - HMD-AMP: Protein Language-Powered Hierarchical Multi-label Deep Forest
for Annotating Antimicrobial Peptides [5.61222966894307]
We build a diverse and comprehensive multi-label protein sequence database by collecting and cleaning amino acids from various AMP databases.
We develop an end-to-end hierarchical multi-label deep forest framework, HMD-AMP, to annotate AMP comprehensively.
After identifying an AMP, it further predicts what targets the AMP can effectively kill from eleven available classes.
arXiv Detail & Related papers (2021-11-11T02:10:07Z) - Accelerating Antimicrobial Discovery with Controllable Deep Generative
Models and Molecular Dynamics [109.70543391923344]
CLaSS (Controlled Latent attribute Space Sampling) is an efficient computational method for attribute-controlled generation of molecules.
We screen the generated molecules for additional key attributes by using deep learning classifiers in conjunction with novel features derived from atomistic simulations.
The proposed approach is demonstrated for designing non-toxic antimicrobial peptides (AMPs) with strong broad-spectrum potency.
arXiv Detail & Related papers (2020-05-22T15:57:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.