Protein Folding Neural Networks Are Not Robust
- URL: http://arxiv.org/abs/2109.04460v1
- Date: Thu, 9 Sep 2021 17:57:19 GMT
- Title: Protein Folding Neural Networks Are Not Robust
- Authors: Sumit Kumar Jha, Arvind Ramanathan, Rickard Ewetz, Alvaro Velasquez,
Susmit Jha
- Abstract summary: Deep neural networks such as AlphaFold and RoseTTAFold predict remarkably accurate structures of proteins.
In this paper, we demonstrate that RoseTTAFold does not exhibit such a robustness despite its high accuracy.
We use adversarial attack methods to create adversarial protein sequences, and show that the RMSD in the predicted protein structure ranges from 0.119rA to 34.162rA when the adversarial perturbations are bounded by 20 units in the BLOSUM62 distance.
- Score: 15.621671501134028
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks such as AlphaFold and RoseTTAFold predict remarkably
accurate structures of proteins compared to other algorithmic approaches. It is
known that biologically small perturbations in the protein sequence do not lead
to drastic changes in the protein structure. In this paper, we demonstrate that
RoseTTAFold does not exhibit such a robustness despite its high accuracy, and
biologically small perturbations for some input sequences result in radically
different predicted protein structures. This raises the challenge of detecting
when these predicted protein structures cannot be trusted. We define the
robustness measure for the predicted structure of a protein sequence to be the
inverse of the root-mean-square distance (RMSD) in the predicted structure and
the structure of its adversarially perturbed sequence. We use adversarial
attack methods to create adversarial protein sequences, and show that the RMSD
in the predicted protein structure ranges from 0.119\r{A} to 34.162\r{A} when
the adversarial perturbations are bounded by 20 units in the BLOSUM62 distance.
This demonstrates very high variance in the robustness measure of the predicted
structures. We show that the magnitude of the correlation (0.917) between our
robustness measure and the RMSD between the predicted structure and the ground
truth is high, that is, the predictions with low robustness measure cannot be
trusted. This is the first paper demonstrating the susceptibility of
RoseTTAFold to adversarial attacks.
Related papers
- CPE-Pro: A Structure-Sensitive Deep Learning Method for Protein Representation and Origin Evaluation [7.161099050722313]
We develop a structure-sensitive supervised deep learning model, Crystal vs Predicted Evaluator for Protein Structure (CPE-Pro)
CPE-Pro learns the structural information of proteins and captures inter-structural differences to achieve accurate traceability on four data classes.
We utilize Foldseek to encode protein structures into "structure-sequences" and trained a protein Structural Sequence Language Model, SSLM.
arXiv Detail & Related papers (2024-10-21T02:21:56Z) - pLDDT-Predictor: High-speed Protein Screening Using Transformer and ESM2 [4.930667479611019]
We introduce pLDDT-Predictor, a high-speed protein screening tool that bridges the gap by leveraging pre-trained ESM2 protein embeddings and a Transformer architecture.
Our experimental results, conducted on a diverse dataset of 1.5 million protein sequences, demonstrate that pLDDT-Predictor can classify more than 90 percent of proteins with a pLDDT score above 70.
arXiv Detail & Related papers (2024-10-11T03:19:44Z) - Structure-Informed Protein Language Model [38.019425619750265]
We introduce the integration of remote homology detection to distill structural information into protein language models.
We evaluate the impact of this structure-informed training on downstream protein function prediction tasks.
arXiv Detail & Related papers (2024-02-07T09:32:35Z) - Protein 3D Graph Structure Learning for Robust Structure-based Protein
Property Prediction [43.46012602267272]
Protein structure-based property prediction has emerged as a promising approach for various biological tasks.
Current practices, which simply employ accurately predicted structures during inference, suffer from notable degradation in prediction accuracy.
Our framework is model-agnostic and effective in improving the property prediction of both predicted structures and experimental structures.
arXiv Detail & Related papers (2023-10-14T08:43:42Z) - Structure-informed Language Models Are Protein Designers [69.70134899296912]
We present LM-Design, a generic approach to reprogramming sequence-based protein language models (pLMs)
We conduct a structural surgery on pLMs, where a lightweight structural adapter is implanted into pLMs and endows it with structural awareness.
Experiments show that our approach outperforms the state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2023-02-03T10:49:52Z) - On the Robustness of AlphaFold: A COVID-19 Case Study [16.564151738086434]
We demonstrate that AlphaFold does not exhibit such robustness despite its high accuracy.
This raises the challenge of detecting and quantifying the extent to which these predicted protein structures can be trusted.
arXiv Detail & Related papers (2023-01-10T17:31:39Z) - State-specific protein-ligand complex structure prediction with a
multi-scale deep generative model [68.28309982199902]
We present NeuralPLexer, a computational approach that can directly predict protein-ligand complex structures.
Our study suggests that a data-driven approach can capture the structural cooperativity between proteins and small molecules, showing promise in accelerating the design of enzymes, drug molecules, and beyond.
arXiv Detail & Related papers (2022-09-30T01:46:38Z) - Learning Geometrically Disentangled Representations of Protein Folding
Simulations [72.03095377508856]
This work focuses on learning a generative neural network on a structural ensemble of a drug-target protein.
Model tasks involve characterizing the distinct structural fluctuations of the protein bound to various drug molecules.
Results show that our geometric learning-based method enjoys both accuracy and efficiency for generating complex structural variations.
arXiv Detail & Related papers (2022-05-20T19:38:00Z) - EBM-Fold: Fully-Differentiable Protein Folding Powered by Energy-based
Models [53.17320541056843]
We propose a fully-differentiable approach for protein structure optimization, guided by a data-driven generative network.
Our EBM-Fold approach can efficiently produce high-quality decoys, compared against traditional Rosetta-based structure optimization routines.
arXiv Detail & Related papers (2021-05-11T03:40:29Z) - Transfer Learning for Protein Structure Classification at Low Resolution [124.5573289131546]
We show that it is possible to make accurate ($geq$80%) predictions of protein class and architecture from structures determined at low ($leq$3A) resolution.
We provide proof of concept for high-speed, low-cost protein structure classification at low resolution, and a basis for extension to prediction of function.
arXiv Detail & Related papers (2020-08-11T15:01:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.