On the Robustness of AlphaFold: A COVID-19 Case Study
- URL: http://arxiv.org/abs/2301.04093v2
- Date: Thu, 12 Jan 2023 17:34:54 GMT
- Title: On the Robustness of AlphaFold: A COVID-19 Case Study
- Authors: Ismail Alkhouri, Sumit Jha, Andre Beckus, George Atia, Alvaro
Velasquez, Rickard Ewetz, Arvind Ramanathan, Susmit Jha
- Abstract summary: We demonstrate that AlphaFold does not exhibit such robustness despite its high accuracy.
This raises the challenge of detecting and quantifying the extent to which these predicted protein structures can be trusted.
- Score: 16.564151738086434
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Protein folding neural networks (PFNNs) such as AlphaFold predict remarkably
accurate structures of proteins compared to other approaches. However, the
robustness of such networks has heretofore not been explored. This is
particularly relevant given the broad social implications of such technologies
and the fact that biologically small perturbations in the protein sequence do
not generally lead to drastic changes in the protein structure. In this paper,
we demonstrate that AlphaFold does not exhibit such robustness despite its high
accuracy. This raises the challenge of detecting and quantifying the extent to
which these predicted protein structures can be trusted. To measure the
robustness of the predicted structures, we utilize (i) the root-mean-square
deviation (RMSD) and (ii) the Global Distance Test (GDT) similarity measure
between the predicted structure of the original sequence and the structure of
its adversarially perturbed version. We prove that the problem of minimally
perturbing protein sequences to fool protein folding neural networks is
NP-complete. Based on the well-established BLOSUM62 sequence alignment scoring
matrix, we generate adversarial protein sequences and show that the RMSD
between the predicted protein structure and the structure of the original
sequence are very large when the adversarial changes are bounded by (i) 20
units in the BLOSUM62 distance, and (ii) five residues (out of hundreds or
thousands of residues) in the given protein sequence. In our experimental
evaluation, we consider 111 COVID-19 proteins in the Universal Protein resource
(UniProt), a central resource for protein data managed by the European
Bioinformatics Institute, Swiss Institute of Bioinformatics, and the US Protein
Information Resource. These result in an overall GDT similarity test score
average of around 34%, demonstrating a substantial drop in the performance of
AlphaFold.
Related papers
- A PLMs based protein retrieval framework [4.110243520064533]
We propose a novel protein retrieval framework that mitigates the bias towards sequence similarity.
Our framework initiatively harnesses protein language models (PLMs) to embed protein sequences within a high-dimensional feature space.
Extensive experiments demonstrate that our framework can equally retrieve both similar and dissimilar proteins.
arXiv Detail & Related papers (2024-07-16T09:52:42Z) - PSC-CPI: Multi-Scale Protein Sequence-Structure Contrasting for
Efficient and Generalizable Compound-Protein Interaction Prediction [63.50967073653953]
Compound-Protein Interaction prediction aims to predict the pattern and strength of compound-protein interactions for rational drug discovery.
Existing deep learning-based methods utilize only the single modality of protein sequences or structures.
We propose a novel multi-scale Protein Sequence-structure Contrasting framework for CPI prediction.
arXiv Detail & Related papers (2024-02-13T03:51:10Z) - Protein 3D Graph Structure Learning for Robust Structure-based Protein
Property Prediction [43.46012602267272]
Protein structure-based property prediction has emerged as a promising approach for various biological tasks.
Current practices, which simply employ accurately predicted structures during inference, suffer from notable degradation in prediction accuracy.
Our framework is model-agnostic and effective in improving the property prediction of both predicted structures and experimental structures.
arXiv Detail & Related papers (2023-10-14T08:43:42Z) - Enhancing the Protein Tertiary Structure Prediction by Multiple Sequence
Alignment Generation [30.2874172276931]
We introduce MSA-Augmenter, which generates useful, novel protein sequences not currently found in databases.
Our experiments on CASP14 demonstrate that MSA-Augmenter can generate de novo sequences that retain co-evolutionary information from inferior MSAs.
arXiv Detail & Related papers (2023-06-02T14:13:50Z) - Structure-informed Language Models Are Protein Designers [69.70134899296912]
We present LM-Design, a generic approach to reprogramming sequence-based protein language models (pLMs)
We conduct a structural surgery on pLMs, where a lightweight structural adapter is implanted into pLMs and endows it with structural awareness.
Experiments show that our approach outperforms the state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2023-02-03T10:49:52Z) - State-specific protein-ligand complex structure prediction with a
multi-scale deep generative model [68.28309982199902]
We present NeuralPLexer, a computational approach that can directly predict protein-ligand complex structures.
Our study suggests that a data-driven approach can capture the structural cooperativity between proteins and small molecules, showing promise in accelerating the design of enzymes, drug molecules, and beyond.
arXiv Detail & Related papers (2022-09-30T01:46:38Z) - Learning Geometrically Disentangled Representations of Protein Folding
Simulations [72.03095377508856]
This work focuses on learning a generative neural network on a structural ensemble of a drug-target protein.
Model tasks involve characterizing the distinct structural fluctuations of the protein bound to various drug molecules.
Results show that our geometric learning-based method enjoys both accuracy and efficiency for generating complex structural variations.
arXiv Detail & Related papers (2022-05-20T19:38:00Z) - Protein Folding Neural Networks Are Not Robust [15.621671501134028]
Deep neural networks such as AlphaFold and RoseTTAFold predict remarkably accurate structures of proteins.
In this paper, we demonstrate that RoseTTAFold does not exhibit such a robustness despite its high accuracy.
We use adversarial attack methods to create adversarial protein sequences, and show that the RMSD in the predicted protein structure ranges from 0.119rA to 34.162rA when the adversarial perturbations are bounded by 20 units in the BLOSUM62 distance.
arXiv Detail & Related papers (2021-09-09T17:57:19Z) - EBM-Fold: Fully-Differentiable Protein Folding Powered by Energy-based
Models [53.17320541056843]
We propose a fully-differentiable approach for protein structure optimization, guided by a data-driven generative network.
Our EBM-Fold approach can efficiently produce high-quality decoys, compared against traditional Rosetta-based structure optimization routines.
arXiv Detail & Related papers (2021-05-11T03:40:29Z) - Transfer Learning for Protein Structure Classification at Low Resolution [124.5573289131546]
We show that it is possible to make accurate ($geq$80%) predictions of protein class and architecture from structures determined at low ($leq$3A) resolution.
We provide proof of concept for high-speed, low-cost protein structure classification at low resolution, and a basis for extension to prediction of function.
arXiv Detail & Related papers (2020-08-11T15:01:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.