Related papers: Towards Foundation Models for Cryo-ET Subtomogram Analysis

Towards Foundation Models for Cryo-ET Subtomogram Analysis

URL: http://arxiv.org/abs/2509.24311v2
Date: Sun, 05 Oct 2025 01:44:02 GMT
Title: Towards Foundation Models for Cryo-ET Subtomogram Analysis
Authors: Runmin Jiang, Wanyue Feng, Yuntian Yang, Shriya Pingulkar, Hong Wang, Xi Xiao, Xiaoyu Cao, Genpei Zhang, Xiao Wang, Xiaolong Wu, Tianyang Wang, Yang Liu, Xingjian Li, Min Xu,
Abstract summary: We introduce CryoEngine, a large-scale synthetic data generator that produces over 904k subtomograms from 452 particle classes for pretraining.<n>Second, we design an Adaptive Phase Tokenization-enhanced Vision Transformer (APT-ViT), which incorporates adaptive phase tokenization as an equivariance-enhancing module.<n>Third, we introduce a Noise-Resilient Contrastive Learning (NRCL) strategy to stabilize representation learning under severe noise conditions.
Score: 36.85797849551338
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Cryo-electron tomography (cryo-ET) enables in situ visualization of macromolecular structures, where subtomogram analysis tasks such as classification, alignment, and averaging are critical for structural determination. However, effective analysis is hindered by scarce annotations, severe noise, and poor generalization. To address these challenges, we take the first step towards foundation models for cryo-ET subtomograms. First, we introduce CryoEngine, a large-scale synthetic data generator that produces over 904k subtomograms from 452 particle classes for pretraining. Second, we design an Adaptive Phase Tokenization-enhanced Vision Transformer (APT-ViT), which incorporates adaptive phase tokenization as an equivariance-enhancing module that improves robustness to both geometric and semantic variations. Third, we introduce a Noise-Resilient Contrastive Learning (NRCL) strategy to stabilize representation learning under severe noise conditions. Evaluations across 24 synthetic and real datasets demonstrate state-of-the-art (SOTA) performance on all three major subtomogram tasks and strong generalization to unseen datasets, advancing scalable and robust subtomogram analysis in cryo-ET.

Related papers

SIGMA: Scalable Spectral Insights for LLM Collapse [51.863164847253366]
We introduce SIGMA (Spectral Inequalities for Gram Matrix Analysis), a unified framework for model collapse.<n>By utilizing benchmarks that deriving and deterministic bounds on the matrix's spectrum, SIGMA provides a mathematically grounded metric to track the contraction of the representation space.<n>We demonstrate that SIGMA effectively captures the transition towards states, offering both theoretical insights into the mechanics of collapse.
arXiv Detail & Related papers (2026-01-06T19:47:11Z)
Physically-Grounded Manifold Projection Model for Generalizable Metal Artifact Reduction in Dental CBCT [20.637726557566793]
Metal artifacts in Dental CBCT severely obscure anatomical structures.<n>Current deep learning for Metal Artifact Reduction (MAR) faces limitations.<n>Denoising Diffusion Models (DDPMs) offer realism but rely on slow, iterative sampling.
arXiv Detail & Related papers (2025-12-30T14:36:26Z)
Synergizing Multigrid Algorithms with Vision Transformer: A Novel Approach to Enhance the Seismic Foundation Model [8.86328796040398]
Existing vision transformers (ViTs) with sequential tokenization ignore the intrinsic pattern and fail to grasp both the high- and low-frequency seismic information efficiently and effectively.<n>This work introduces a novel adaptive two-grid foundation model training strategy (ADATG) with Hilbert encoding specifically tailored for seismogram data.
arXiv Detail & Related papers (2025-11-17T08:37:28Z)
A Novel Data Augmentation Strategy for Robust Deep Learning Classification of Biomedical Time-Series Data: Application to ECG and EEG Analysis [2.355460994057843]
This study proposes a novel and unified deep learning framework that achieves state-of-the-art performance across different signal types.<n>Unlike prior work, we scientifically increase signal complexity to achieve future-reaching capabilities, which resulted in the best predictions.<n>The architecture requires 130 MB of memory and processes each sample in 10 ms, suggesting suitability for deployment on low-end or wearable devices.
arXiv Detail & Related papers (2025-07-16T21:38:10Z)
CryoFastAR: Fast Cryo-EM Ab Initio Reconstruction Made Easy [43.706580683273955]
We introduce CryoFastAR, the first geometric foundation model that can directly predict poses from Cryo-EM noisy images for Fast ab initio Reconstruction.<n>By integrating multi-view features and training on large-scale simulated cryo-EM data with realistic noise and CTF modulations, CryoFastAR enhances pose estimation accuracy and generalization.
arXiv Detail & Related papers (2025-06-06T08:32:32Z)
MathPhys-Guided Coarse-to-Fine Anomaly Synthesis with SQE-Driven Bi-Level Optimization for Anomaly Detection [30.77558600436759]
We introduce a novel and lightweight pipeline that generates synthetic anomalies through Math-Phys model guidance.<n>Our method produces realistic defect masks, which are subsequently enhanced in two phases.<n>To validate our method, we conduct experiments on three anomaly detection benchmarks: MVTec AD, VisA, and BTAD.
arXiv Detail & Related papers (2025-04-17T14:22:27Z)
BHViT: Binarized Hybrid Vision Transformer [53.38894971164072]
Model binarization has made significant progress in enabling real-time and energy-efficient computation for convolutional neural networks (CNN)<n>We propose BHViT, a binarization-friendly hybrid ViT architecture and its full binarization model with the guidance of three important observations.<n>Our proposed algorithm achieves SOTA performance among binary ViT methods.
arXiv Detail & Related papers (2025-03-04T08:35:01Z)
TokenUnify: Scaling Up Autoregressive Pretraining for Neuron Segmentation [65.65530016765615]
We propose a hierarchical predictive coding framework that captures multi-scale dependencies through three complementary learning objectives.<n> TokenUnify integrates random token prediction, next-token prediction, and next-all token prediction to create a comprehensive representational space.<n>We also introduce a large-scale EM dataset with 1.2 billion annotated voxels, offering ideal long-sequence visual data with spatial continuity.
arXiv Detail & Related papers (2024-05-27T05:45:51Z)
CryoGEM: Physics-Informed Generative Cryo-Electron Microscopy [38.57626501108458]
We introduce physics-informed generative cryo-electron microscopy (CryoGEM) CryoGEM integrates physics-based cryo-EM simulation with a generative unpaired noise translation to generate realistic noises. Experiments show that CryoGEM is capable of generating authentic cryo-EM images.
arXiv Detail & Related papers (2023-12-04T07:52:56Z)
Pre-training via Denoising for Molecular Property Prediction [53.409242538744444]
We describe a pre-training technique that utilizes large datasets of 3D molecular structures at equilibrium. Inspired by recent advances in noise regularization, our pre-training objective is based on denoising.
arXiv Detail & Related papers (2022-05-31T22:28:34Z)
A microstructure estimation Transformer inspired by sparse representation for diffusion MRI [11.761543033212797]
We present a learning-based framework based on Transformer for dMRI-based microstructure estimation with downsampled q-space data. The proposed method achieved up to 11.25 folds of acceleration in scan time and outperformed the other state-of-the-art learning-based methods.
arXiv Detail & Related papers (2022-05-13T05:14:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.