Related papers: Protein Folding Neural Networks Are Not Robust

Protein Folding Neural Networks Are Not Robust

URL: http://arxiv.org/abs/2109.04460v1
Date: Thu, 9 Sep 2021 17:57:19 GMT
Title: Protein Folding Neural Networks Are Not Robust
Authors: Sumit Kumar Jha, Arvind Ramanathan, Rickard Ewetz, Alvaro Velasquez, Susmit Jha
Abstract summary: Deep neural networks such as AlphaFold and RoseTTAFold predict remarkably accurate structures of proteins. In this paper, we demonstrate that RoseTTAFold does not exhibit such a robustness despite its high accuracy. We use adversarial attack methods to create adversarial protein sequences, and show that the RMSD in the predicted protein structure ranges from 0.119rA to 34.162rA when the adversarial perturbations are bounded by 20 units in the BLOSUM62 distance.
Score: 15.621671501134028
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep neural networks such as AlphaFold and RoseTTAFold predict remarkably accurate structures of proteins compared to other algorithmic approaches. It is known that biologically small perturbations in the protein sequence do not lead to drastic changes in the protein structure. In this paper, we demonstrate that RoseTTAFold does not exhibit such a robustness despite its high accuracy, and biologically small perturbations for some input sequences result in radically different predicted protein structures. This raises the challenge of detecting when these predicted protein structures cannot be trusted. We define the robustness measure for the predicted structure of a protein sequence to be the inverse of the root-mean-square distance (RMSD) in the predicted structure and the structure of its adversarially perturbed sequence. We use adversarial attack methods to create adversarial protein sequences, and show that the RMSD in the predicted protein structure ranges from 0.119\r{A} to 34.162\r{A} when the adversarial perturbations are bounded by 20 units in the BLOSUM62 distance. This demonstrates very high variance in the robustness measure of the predicted structures. We show that the magnitude of the correlation (0.917) between our robustness measure and the RMSD between the predicted structure and the ground truth is high, that is, the predictions with low robustness measure cannot be trusted. This is the first paper demonstrating the susceptibility of RoseTTAFold to adversarial attacks.

Related papers

DISPROTBENCH: A Disorder-Aware, Task-Rich Benchmark for Evaluating Protein Structure Prediction in Realistic Biological Contexts [76.59606029593085]
DisProtBench is a benchmark for evaluating protein structure prediction models (PSPMs) under structural disorder and complex biological conditions.<n>DisProtBench spans three key axes: data complexity, task diversity, and Interpretability.<n>Results reveal significant variability in model robustness under disorder, with low-confidence regions linked to functional prediction failures.
arXiv Detail & Related papers (2025-06-18T23:58:22Z)
Exploring zero-shot structure-based protein fitness prediction [0.5524804393257919]
We make zero-shot predictions about the fitness consequences of protein sequence changes with pre-trained machine learning models. We assess several modeling choices for structure-based models and their effects on downstream fitness prediction.
arXiv Detail & Related papers (2025-04-23T17:01:09Z)
CPE-Pro: A Structure-Sensitive Deep Learning Method for Protein Representation and Origin Evaluation [7.161099050722313]
We develop a structure-sensitive supervised deep learning model, Crystal vs Predicted Evaluator for Protein Structure (CPE-Pro) CPE-Pro learns the structural information of proteins and captures inter-structural differences to achieve accurate traceability on four data classes. We utilize Foldseek to encode protein structures into "structure-sequences" and trained a protein Structural Sequence Language Model, SSLM.
arXiv Detail & Related papers (2024-10-21T02:21:56Z)
A Protein Structure Prediction Approach Leveraging Transformer and CNN Integration [4.909112037834705]
This paper adopts a two-dimensional fusion deep neural network model, DstruCCN, which uses Convolutional Neural Networks (CCN) and a supervised Transformer protein language model for single-sequence protein structure prediction. The training features of the two are combined to predict the protein Transformer binding site matrix, and then the three-dimensional structure is reconstructed using energy minimization.
arXiv Detail & Related papers (2024-02-29T12:24:20Z)
Structure-Informed Protein Language Model [38.019425619750265]
We introduce the integration of remote homology detection to distill structural information into protein language models. We evaluate the impact of this structure-informed training on downstream protein function prediction tasks.
arXiv Detail & Related papers (2024-02-07T09:32:35Z)
Protein 3D Graph Structure Learning for Robust Structure-based Protein Property Prediction [43.46012602267272]
Protein structure-based property prediction has emerged as a promising approach for various biological tasks. Current practices, which simply employ accurately predicted structures during inference, suffer from notable degradation in prediction accuracy. Our framework is model-agnostic and effective in improving the property prediction of both predicted structures and experimental structures.
arXiv Detail & Related papers (2023-10-14T08:43:42Z)
Structure-informed Language Models Are Protein Designers [69.70134899296912]
We present LM-Design, a generic approach to reprogramming sequence-based protein language models (pLMs) We conduct a structural surgery on pLMs, where a lightweight structural adapter is implanted into pLMs and endows it with structural awareness. Experiments show that our approach outperforms the state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2023-02-03T10:49:52Z)
On the Robustness of AlphaFold: A COVID-19 Case Study [16.564151738086434]
We demonstrate that AlphaFold does not exhibit such robustness despite its high accuracy. This raises the challenge of detecting and quantifying the extent to which these predicted protein structures can be trusted.
arXiv Detail & Related papers (2023-01-10T17:31:39Z)
State-specific protein-ligand complex structure prediction with a multi-scale deep generative model [68.28309982199902]
We present NeuralPLexer, a computational approach that can directly predict protein-ligand complex structures. Our study suggests that a data-driven approach can capture the structural cooperativity between proteins and small molecules, showing promise in accelerating the design of enzymes, drug molecules, and beyond.
arXiv Detail & Related papers (2022-09-30T01:46:38Z)
Learning Geometrically Disentangled Representations of Protein Folding Simulations [72.03095377508856]
This work focuses on learning a generative neural network on a structural ensemble of a drug-target protein. Model tasks involve characterizing the distinct structural fluctuations of the protein bound to various drug molecules. Results show that our geometric learning-based method enjoys both accuracy and efficiency for generating complex structural variations.
arXiv Detail & Related papers (2022-05-20T19:38:00Z)
EBM-Fold: Fully-Differentiable Protein Folding Powered by Energy-based Models [53.17320541056843]
We propose a fully-differentiable approach for protein structure optimization, guided by a data-driven generative network. Our EBM-Fold approach can efficiently produce high-quality decoys, compared against traditional Rosetta-based structure optimization routines.
arXiv Detail & Related papers (2021-05-11T03:40:29Z)
Transfer Learning for Protein Structure Classification at Low Resolution [124.5573289131546]
We show that it is possible to make accurate ($geq$80%) predictions of protein class and architecture from structures determined at low ($leq$3A) resolution. We provide proof of concept for high-speed, low-cost protein structure classification at low resolution, and a basis for extension to prediction of function.
arXiv Detail & Related papers (2020-08-11T15:01:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.