SPLDExtraTrees: Robust machine learning approach for predicting kinase
inhibitor resistance
- URL: http://arxiv.org/abs/2111.08008v1
- Date: Mon, 15 Nov 2021 09:07:45 GMT
- Title: SPLDExtraTrees: Robust machine learning approach for predicting kinase
inhibitor resistance
- Authors: Ziyi Yang, Zhaofeng Ye, Yijia Xiao, and Changyu Hsieh
- Abstract summary: We propose a robust machine learning method, SPLDExtraTrees, which can accurately predict ligand binding affinity changes upon protein mutation.
The proposed method ranks training data following a specific scheme that starts with easy-to-learn samples.
Experiments substantiate the capability of the proposed method for predicting kinase inhibitor resistance under three scenarios.
- Score: 1.0674604700001966
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Drug resistance is a major threat to the global health and a significant
concern throughout the clinical treatment of diseases and drug development. The
mutation in proteins that is related to drug binding is a common cause for
adaptive drug resistance. Therefore, quantitative estimations of how mutations
would affect the interaction between a drug and the target protein would be of
vital significance for the drug development and the clinical practice.
Computational methods that rely on molecular dynamics simulations, Rosetta
protocols, as well as machine learning methods have been proven to be capable
of predicting ligand affinity changes upon protein mutation. However, the
severely limited sample size and heavy noise induced overfitting and
generalization issues have impeded wide adoption of machine learning for
studying drug resistance. In this paper, we propose a robust machine learning
method, termed SPLDExtraTrees, which can accurately predict ligand binding
affinity changes upon protein mutation and identify resistance-causing
mutations. Especially, the proposed method ranks training data following a
specific scheme that starts with easy-to-learn samples and gradually
incorporates harder and diverse samples into the training, and then iterates
between sample weight recalculations and model updates. In addition, we
calculate additional physics-based structural features to provide the machine
learning model with the valuable domain knowledge on proteins for this
data-limited predictive tasks. The experiments substantiate the capability of
the proposed method for predicting kinase inhibitor resistance under three
scenarios, and achieves predictive accuracy comparable to that of molecular
dynamics and Rosetta methods with much less computational costs.
Related papers
- Stacked ensemble\-based mutagenicity prediction model using multiple modalities with graph attention network [0.9736758288065405]
Mutagenicity is a concern due to its association with genetic mutations which can result in a variety of negative consequences.
In this work, we introduce a novel stacked ensemble based mutagenicity prediction model.
arXiv Detail & Related papers (2024-09-03T09:14:21Z) - Learning to Predict Mutation Effects of Protein-Protein Interactions by Microenvironment-aware Hierarchical Prompt Learning [78.38442423223832]
We develop a novel codebook pre-training task, namely masked microenvironment modeling.
We demonstrate superior performance and training efficiency over state-of-the-art pre-training-based methods in mutation effect prediction.
arXiv Detail & Related papers (2024-05-16T03:53:21Z) - Data-Error Scaling in Machine Learning on Natural Discrete Combinatorial Mutation-prone Sets: Case Studies on Peptides and Small Molecules [0.0]
We investigate trends in the data-error scaling behavior of machine learning (ML) models trained on discrete spaces that are prone-to-mutation.
In contrast to typical data-error scaling, our results showed discontinuous monotonic phase transitions during learning.
We present an alternative strategy to normalize learning curves and the concept of mutant based shuffling.
arXiv Detail & Related papers (2024-05-08T16:04:50Z) - Protein binding affinity prediction under multiple substitutions applying eGNNs on Residue and Atomic graphs combined with Language model information: eGRAL [1.840390797252648]
Deep learning is increasingly recognized as a powerful tool capable of bridging the gap between in-silico predictions and in-vitro observations.
We propose eGRAL, a novel graph neural network architecture designed for predicting binding affinity changes from amino acid substitutions in protein complexes.
eGRAL leverages residue, atomic and evolutionary scales, thanks to features extracted from protein large language models.
arXiv Detail & Related papers (2024-05-03T10:33:19Z) - Physical formula enhanced multi-task learning for pharmacokinetics prediction [54.13787789006417]
A major challenge for AI-driven drug discovery is the scarcity of high-quality data.
We develop a formula enhanced mul-ti-task learning (PEMAL) method that predicts four key parameters of pharmacokinetics simultaneously.
Our experiments reveal that PEMAL significantly lowers the data demand, compared to typical Graph Neural Networks.
arXiv Detail & Related papers (2024-04-16T07:42:55Z) - Efficiently Predicting Protein Stability Changes Upon Single-point
Mutation with Large Language Models [51.57843608615827]
The ability to precisely predict protein thermostability is pivotal for various subfields and applications in biochemistry.
We introduce an ESM-assisted efficient approach that integrates protein sequence and structural features to predict the thermostability changes in protein upon single-point mutations.
arXiv Detail & Related papers (2023-12-07T03:25:49Z) - Accurate and Definite Mutational Effect Prediction with Lightweight
Equivariant Graph Neural Networks [2.381587712372268]
This research introduces a lightweight graph representation learning scheme that efficiently analyzes the microenvironment of wild-type proteins.
Our solution offers a wide range of benefits that make it an ideal choice for the community.
arXiv Detail & Related papers (2023-04-13T09:51:49Z) - Reprogramming Pretrained Language Models for Protein Sequence
Representation Learning [68.75392232599654]
We propose Representation Learning via Dictionary Learning (R2DL), an end-to-end representation learning framework.
R2DL reprograms a pretrained English language model to learn the embeddings of protein sequences.
Our model can attain better accuracy and significantly improve the data efficiency by up to $105$ times over the baselines set by pretrained and standard supervised methods.
arXiv Detail & Related papers (2023-01-05T15:55:18Z) - Benchmarking Heterogeneous Treatment Effect Models through the Lens of
Interpretability [82.29775890542967]
Estimating personalized effects of treatments is a complex, yet pervasive problem.
Recent developments in the machine learning literature on heterogeneous treatment effect estimation gave rise to many sophisticated, but opaque, tools.
We use post-hoc feature importance methods to identify features that influence the model's predictions.
arXiv Detail & Related papers (2022-06-16T17:59:05Z) - Extracting Chemical-Protein Interactions via Calibrated Deep Neural
Network and Self-training [0.8376091455761261]
"calibration" techniques have been applied to deep learning models to estimate the data uncertainty and improve the reliability.
In this study, to extract chemical--protein interactions, we propose a DNN-based approach incorporating uncertainty information and calibration techniques.
Our approach has achieved state-of-the-art performance with regard to the Biocreative VI ChemProt task, while preserving higher calibration abilities than those of previous approaches.
arXiv Detail & Related papers (2020-11-04T10:14:31Z) - Deep Learning for Virtual Screening: Five Reasons to Use ROC Cost
Functions [80.12620331438052]
deep learning has become an important tool for rapid screening of billions of molecules in silico for potential hits containing desired chemical features.
Despite its importance, substantial challenges persist in training these models, such as severe class imbalance, high decision thresholds, and lack of ground truth labels in some datasets.
We argue in favor of directly optimizing the receiver operating characteristic (ROC) in such cases, due to its robustness to class imbalance.
arXiv Detail & Related papers (2020-06-25T08:46:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.