Related papers: Generalization of Self-Supervised Vision Transformers for Protein Localization Across Microscopy Domains

Generalization of Self-Supervised Vision Transformers for Protein Localization Across Microscopy Domains

URL: http://arxiv.org/abs/2602.05527v2
Date: Fri, 06 Feb 2026 10:27:15 GMT
Title: Generalization of Self-Supervised Vision Transformers for Protein Localization Across Microscopy Domains
Authors: Ben Isselmann, Dilara Göksu, Andreas Weinmann,
Abstract summary: Self-supervised learning (SSL) can mitigate this by pretraining on large unlabeled datasets.<n>We generate image embeddings using three DINO backbones pretrained on ImageNet-1k, the Human Protein Atlas (HPA), and OpenCell.<n>All pretrained models transfer well, with the microscopy-specific HPA-pretrained model achieving the best performance.
Score: 0.254890465057467
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Task-specific microscopy datasets are often too small to train deep learning models that learn robust feature representations. Self-supervised learning (SSL) can mitigate this by pretraining on large unlabeled datasets, but it remains unclear how well such representations transfer across microscopy domains with different staining protocols and channel configurations. We investigate the cross-domain transferability of DINO-pretrained Vision Transformers for protein localization on the OpenCell dataset. We generate image embeddings using three DINO backbones pretrained on ImageNet-1k, the Human Protein Atlas (HPA), and OpenCell, and evaluate them by training a supervised classification head on OpenCell labels. All pretrained models transfer well, with the microscopy-specific HPA-pretrained model achieving the best performance (mean macro $F_1$-score = 0.8221 $\pm$ 0.0062), slightly outperforming a DINO model trained directly on OpenCell (0.8057 $\pm$ 0.0090). These results highlight the value of large-scale pretraining and indicate that domain-relevant SSL representations can generalize effectively to related but distinct microscopy datasets, enabling strong downstream performance even when task-specific labeled data are limited.

Related papers

MatSSL: Robust Self-Supervised Representation Learning for Metallographic Image Segmentation [0.2799243500184682]
MatSSL is a streamlined self-supervised learning architecture that employs Gated Feature Fusion at each stage of the backbone to integrate multi-level representations effectively.<n>We first perform self-supervised pretraining on a small-scale, unlabeled dataset and then fine-tune the model on multiple benchmark datasets.
arXiv Detail & Related papers (2025-07-24T08:32:41Z)
CellViT++: Energy-Efficient and Adaptive Cell Segmentation and Classification Using Foundation Models [1.7674154313605157]
$textCellViTscriptstyle ++$ is a framework for generalized cell segmentation in digital pathology.<n>$textCellViTscriptstyle ++$ is an open-source framework featuring a user-friendly, web-based interface for visualization and annotation.
arXiv Detail & Related papers (2025-01-09T14:26:50Z)
Hierarchical Multi-Label Classification with Missing Information for Benthic Habitat Imagery [1.6492989697868894]
We show the capacity to conduct HML training in scenarios where there exist multiple levels of missing annotation information. We find that, when using smaller one-hot image label datasets typical of local or regional scale benthic science projects, models pre-trained with self-supervision on a larger collection of in-domain benthic data outperform models pre-trained on ImageNet.
arXiv Detail & Related papers (2024-09-10T16:15:01Z)
A Closer Look at Benchmarking Self-Supervised Pre-training with Image Classification [51.35500308126506]
Self-supervised learning (SSL) is a machine learning approach where the data itself provides supervision, eliminating the need for external labels. We study how classification-based evaluation protocols for SSL correlate and how well they predict downstream performance on different dataset types.
arXiv Detail & Related papers (2024-07-16T23:17:36Z)
Masked Autoencoders for Microscopy are Scalable Learners of Cellular Biology [2.7280901660033643]
This work explores the scaling properties of weakly supervised classifiers and self-supervised masked autoencoders (MAEs) Our results show that ViT-based MAEs outperform weakly supervised classifiers on a variety of tasks, achieving as much as a 11.5% relative improvement when recalling known biological relationships curated from public databases. We develop a new channel-agnostic MAE architecture (CA-MAE) that allows for inputting images of different numbers and orders of channels at inference time.
arXiv Detail & Related papers (2024-04-16T02:42:06Z)
Affine-Consistent Transformer for Multi-Class Cell Nuclei Detection [76.11864242047074]
We propose a novel Affine-Consistent Transformer (AC-Former), which directly yields a sequence of nucleus positions. We introduce an Adaptive Affine Transformer (AAT) module, which can automatically learn the key spatial transformations to warp original images for local network training. Experimental results demonstrate that the proposed method significantly outperforms existing state-of-the-art algorithms on various benchmarks.
arXiv Detail & Related papers (2023-10-22T02:27:02Z)
Spatio-Temporal Encoding of Brain Dynamics with Surface Masked Autoencoders [10.097983222759884]
Surface Masked AutoEncoder (sMAE) and surface Masked AutoEncoder (MAE) These models are trained to reconstruct cortical feature maps from masked versions of the input by learning strong latent representations of cortical development and structure function. Results show that (v)sMAE pre-trained models improve phenotyping prediction performance on multiple tasks by $ge 26%$, and offer faster convergence relative to models trained from scratch.
arXiv Detail & Related papers (2023-08-10T10:01:56Z)
Leveraging generative adversarial networks to create realistic scanning transmission electron microscopy images [2.5954872177280346]
Machine learning could revolutionize materials research through autonomous data collection and processing. We employ a cycle generative adversarial network (CycleGAN) with a reciprocal space discriminator to augment simulated data with realistic spatial frequency information. We showcase our approach by training a fully convolutional network (FCN) to identify single atom defects in a 4.5 million atom data set.
arXiv Detail & Related papers (2023-01-18T19:19:27Z)
Self-Supervised Pre-Training for Transformer-Based Person Re-Identification [54.55281692768765]
Transformer-based supervised pre-training achieves great performance in person re-identification (ReID) Due to the domain gap between ImageNet and ReID datasets, it usually needs a larger pre-training dataset to boost the performance. This work aims to mitigate the gap between the pre-training and ReID datasets from the perspective of data and model structure.
arXiv Detail & Related papers (2021-11-23T18:59:08Z)
Towards an Automatic Analysis of CHO-K1 Suspension Growth in Microfluidic Single-cell Cultivation [63.94623495501023]
We propose a novel Machine Learning architecture, which allows us to infuse a neural deep network with human-powered abstraction on the level of data. Specifically, we train a generative model simultaneously on natural and synthetic data, so that it learns a shared representation, from which a target variable, such as the cell count, can be reliably estimated.
arXiv Detail & Related papers (2020-10-20T08:36:51Z)
Global Voxel Transformer Networks for Augmented Microscopy [54.730707387866076]
We introduce global voxel transformer networks (GVTNets), an advanced deep learning tool for augmented microscopy. GVTNets are built on global voxel transformer operators (GVTOs), which are able to aggregate global information. We apply the proposed methods on existing datasets for three different augmented microscopy tasks under various settings.
arXiv Detail & Related papers (2020-08-05T20:11:15Z)
Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training. We experimentally verify that the new dataset can significantly improve the ability of the learned FER model. To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.