CryoBench: Diverse and challenging datasets for the heterogeneity problem in cryo-EM
- URL: http://arxiv.org/abs/2408.05526v1
- Date: Sat, 10 Aug 2024 11:48:14 GMT
- Title: CryoBench: Diverse and challenging datasets for the heterogeneity problem in cryo-EM
- Authors: Minkyu Jeon, Rishwanth Raghu, Miro Astore, Geoffrey Woollard, Ryan Feathers, Alkin Kaz, Sonya M. Hanson, Pilar Cossio, Ellen D. Zhong,
- Abstract summary: Cryo-electron microscopy (cryo-EM) is a powerful technique for determining high-resolution 3D biomolecular structures from imaging data.
CryoBench is a suite of datasets, metrics, and performance benchmarks for heterogeneous reconstruction in cryo-EM.
- Score: 3.424647356090208
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Cryo-electron microscopy (cryo-EM) is a powerful technique for determining high-resolution 3D biomolecular structures from imaging data. As this technique can capture dynamic biomolecular complexes, 3D reconstruction methods are increasingly being developed to resolve this intrinsic structural heterogeneity. However, the absence of standardized benchmarks with ground truth structures and validation metrics limits the advancement of the field. Here, we propose CryoBench, a suite of datasets, metrics, and performance benchmarks for heterogeneous reconstruction in cryo-EM. We propose five datasets representing different sources of heterogeneity and degrees of difficulty. These include conformational heterogeneity generated from simple motions and random configurations of antibody complexes and from tens of thousands of structures sampled from a molecular dynamics simulation. We also design datasets containing compositional heterogeneity from mixtures of ribosome assembly states and 100 common complexes present in cells. We then perform a comprehensive analysis of state-of-the-art heterogeneous reconstruction tools including neural and non-neural methods and their sensitivity to noise, and propose new metrics for quantitative comparison of methods. We hope that this benchmark will be a foundational resource for analyzing existing methods and new algorithmic development in both the cryo-EM and machine learning communities.
Related papers
- Structure Language Models for Protein Conformation Generation [66.42864253026053]
Traditional physics-based simulation methods often struggle with sampling equilibrium conformations.
Deep generative models have shown promise in generating protein conformations as a more efficient alternative.
We introduce Structure Language Modeling as a novel framework for efficient protein conformation generation.
arXiv Detail & Related papers (2024-10-24T03:38:51Z) - Fast and Functional Structured Data Generators Rooted in
Out-of-Equilibrium Physics [62.997667081978825]
We address the challenge of using energy-based models to produce high-quality, label-specific data in structured datasets.
Traditional training methods encounter difficulties due to inefficient Markov chain Monte Carlo mixing.
We use a novel training algorithm that exploits non-equilibrium effects.
arXiv Detail & Related papers (2023-07-13T15:08:44Z) - CryoChains: Heterogeneous Reconstruction of Molecular Assembly of
Semi-flexible Chains from Cryo-EM Images [3.0828074702828623]
We propose CryoChains that encodes large deformations of biomolecules via rigid body transformation of their chains.
Our data experiments on the human GABAtextsubscriptB and heat shock protein show that CryoChains gives a biophysically-grounded quantification of the heterogeneous conformations of biomolecules.
arXiv Detail & Related papers (2023-06-12T17:57:12Z) - Optimizations of Autoencoders for Analysis and Classification of
Microscopic In Situ Hybridization Images [68.8204255655161]
We propose a deep-learning framework to detect and classify areas of microscopic images with similar levels of gene expression.
The data we analyze requires an unsupervised learning model for which we employ a type of Artificial Neural Network - Deep Learning Autoencoders.
arXiv Detail & Related papers (2023-04-19T13:45:28Z) - CryoFormer: Continuous Heterogeneous Cryo-EM Reconstruction using
Transformer-based Neural Representations [49.49939711956354]
Cryo-electron microscopy (cryo-EM) allows for the high-resolution reconstruction of 3D structures of proteins and other biomolecules.
It is still challenging to reconstruct the continuous motions of 3D structures from noisy and randomly oriented 2D cryo-EM images.
We propose CryoFormer, a new approach for continuous heterogeneous cryo-EM reconstruction.
arXiv Detail & Related papers (2023-03-28T18:59:17Z) - Amortized Inference for Heterogeneous Reconstruction in Cryo-EM [36.911133113707045]
cryo-electron microscopy (cryo-EM) provides insights into the dynamics of proteins and other building blocks of life.
The algorithmic challenge of jointly estimating the poses, 3D structure, and conformational heterogeneity of a biomolecule remains unsolved.
Our method, cryoFIRE, performs ab initio heterogeneous reconstruction with unknown poses in an amortized framework.
We show that our method can provide one order of magnitude speedup on datasets containing millions of images without any loss of accuracy.
arXiv Detail & Related papers (2022-10-13T22:06:38Z) - Heterogeneous reconstruction of deformable atomic models in Cryo-EM [30.864688165021054]
We describe a heterogeneous reconstruction method based on an atomistic representation whose deformation is reduced to a handful of collective motions.
We show for each distribution that our approach is able to recapitulate the intermediate atomic models with atomic-level accuracy.
arXiv Detail & Related papers (2022-09-29T22:35:35Z) - SHREC 2021: Classification in cryo-electron tomograms [13.443446070180562]
cryo-electron tomography (cryo-ET) is an imaging technique that allows three-dimensional visualization of macro-molecular assemblies.
Cryo-ET comes with a number of challenges, mainly low signal-to-noise and inability to obtain images from all angles.
We generate a novel simulated dataset to benchmark different methods of localization and classification of biological macromolecules in tomograms.
arXiv Detail & Related papers (2022-03-18T16:08:22Z) - A deep learning driven pseudospectral PCE based FFT homogenization
algorithm for complex microstructures [68.8204255655161]
It is shown that the proposed method is able to predict central moments of interest while being magnitudes faster to evaluate than traditional approaches.
It is shown, that the proposed method is able to predict central moments of interest while being magnitudes faster to evaluate than traditional approaches.
arXiv Detail & Related papers (2021-10-26T07:02:14Z) - Deep learning based mixed-dimensional GMM for characterizing variability
in CryoEM [0.0]
CryoEM provides direct visualization of individual macromolecules in different conformational and compositional states.
We present a machine learning algorithm to determine a conformational landscape for proteins or complexes.
We demonstrate this method on several different biomolecular systems to explore compositional and conformational changes at a range of scales.
arXiv Detail & Related papers (2021-01-25T19:05:23Z) - Learning Mixtures of Low-Rank Models [89.39877968115833]
We study the problem of learning computational mixtures of low-rank models.
We develop an algorithm that is guaranteed to recover the unknown matrices with near-optimal sample.
In addition, the proposed algorithm is provably stable against random noise.
arXiv Detail & Related papers (2020-09-23T17:53:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.