Compressing Biology: Evaluating the Stable Diffusion VAE for Phenotypic Drug Discovery
- URL: http://arxiv.org/abs/2510.19887v1
- Date: Wed, 22 Oct 2025 16:16:49 GMT
- Title: Compressing Biology: Evaluating the Stable Diffusion VAE for Phenotypic Drug Discovery
- Authors: Télio Cropsal, Rocío Mercado,
- Abstract summary: We present the first systematic evaluation of Stable Diffusion's variational autoencoder (SDVAE) for reconstructing Cell Painting images.<n>We find that SDVAE reconstructions preserve phenotypic signals with minimal loss, supporting its use in microscopy.<n>Our findings offer practical guidelines for evaluating generative models on microscopy data and support the use of off-the-shelf models in phenotypic drug discovery.
- Score: 0.8594140167290097
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: High-throughput phenotypic screens generate vast microscopy image datasets that push the limits of generative models due to their large dimensionality. Despite the growing popularity of general-purpose models trained on natural images for microscopy data analysis, their suitability in this domain has not been quantitatively demonstrated. We present the first systematic evaluation of Stable Diffusion's variational autoencoder (SD-VAE) for reconstructing Cell Painting images, assessing performance across a large dataset with diverse molecular perturbations and cell types. We find that SD-VAE reconstructions preserve phenotypic signals with minimal loss, supporting its use in microscopy workflows. To benchmark reconstruction quality, we compare pixel-level, embedding-based, latent-space, and retrieval-based metrics for a biologically informed evaluation. We show that general-purpose feature extractors like InceptionV3 match or surpass publicly available bespoke models in retrieval tasks, simplifying future pipelines. Our findings offer practical guidelines for evaluating generative models on microscopy data and support the use of off-the-shelf models in phenotypic drug discovery.
Related papers
- Investigating the Impact of Histopathological Foundation Models on Regressive Prediction of Homologous Recombination Deficiency [52.50039435394964]
We systematically evaluate foundation models for regression-based tasks.<n>We extract patch-level features from whole slide images (WSI) using five state-of-the-art foundation models.<n>Models are trained to predict continuous HRD scores based on these extracted features across breast, endometrial, and lung cancer cohorts.
arXiv Detail & Related papers (2026-01-29T14:06:50Z) - Diffusion-Based Synthetic Brightfield Microscopy Images for Enhanced Single Cell Detection [0.0]
We investigate the use of unconditional models to generate synthetic brightfield microscopy images.<n>A U-Net based diffusion model was trained and used to create datasets with varying ratios of synthetic and real images.<n>Experiments with YOLOv8, YOLOv9 and RT-DETR reveal that training with synthetic data can achieve improved detection accuracies.
arXiv Detail & Related papers (2025-11-25T08:57:23Z) - Deep Learning for Taxol Exposure Analysis: A New Cell Image Dataset and Attention-Based Baseline Model [1.755209318470883]
Monitoring the effects of the chemotherapeutic agent Taxol at the cellular level is critical for both clinical evaluation and biomedical research.<n>Deep learning approaches have shown great promise in medical and biological image analysis.<n>No publicly available dataset currently exists for automated morphological analysis of cellular responses to Taxol exposure.
arXiv Detail & Related papers (2025-08-20T01:41:26Z) - Disentangled representations of microscopy images [0.9849635250118911]
This work proposes a Disentangled Representation Learning (DRL) methodology to enhance model interpretability for microscopy image classification.<n>We show how a DRL framework, based on transferring a representation learnt from synthetic data, can provide a good trade-off between accuracy and interpretability in this domain.
arXiv Detail & Related papers (2025-06-25T17:44:37Z) - Revisiting Automatic Data Curation for Vision Foundation Models in Digital Pathology [41.34847597178388]
Vision foundation models (FMs) learn to represent histological features in highly heterogeneous tiles extracted from whole-slide images.<n>We investigate the potential of unsupervised automatic data curation at the tile-level, taking into account 350 million tiles.
arXiv Detail & Related papers (2025-03-24T14:23:48Z) - Revealing Subtle Phenotypes in Small Microscopy Datasets Using Latent Diffusion Models [0.815557531820863]
We propose a novel approach that leverages pre-trained latent diffusion models to uncover subtle phenotypic changes.<n>Our findings reveal that our approach enables effective detection of phenotypic variations, capturing both visually apparent and imperceptible differences.
arXiv Detail & Related papers (2025-02-12T15:45:19Z) - Dataset Distillation for Histopathology Image Classification [46.04496989951066]
We introduce a novel dataset distillation algorithm tailored for histopathology image datasets (Histo-DD)
We conduct a comprehensive evaluation of the effectiveness of the proposed algorithm and the generated histopathology samples in both patch-level and slide-level classification tasks.
arXiv Detail & Related papers (2024-08-19T05:53:38Z) - Masked Autoencoders for Microscopy are Scalable Learners of Cellular Biology [2.7280901660033643]
This work explores the scaling properties of weakly supervised classifiers and self-supervised masked autoencoders (MAEs)
Our results show that ViT-based MAEs outperform weakly supervised classifiers on a variety of tasks, achieving as much as a 11.5% relative improvement when recalling known biological relationships curated from public databases.
We develop a new channel-agnostic MAE architecture (CA-MAE) that allows for inputting images of different numbers and orders of channels at inference time.
arXiv Detail & Related papers (2024-04-16T02:42:06Z) - DinoBloom: A Foundation Model for Generalizable Cell Embeddings in Hematology [1.3551232282678036]
We introduce DinoBloom, the first foundation model for single cell images in hematology.
Our model is built upon an extensive collection of 13 diverse, publicly available datasets of peripheral blood and bone marrow smears.
A family of four DinoBloom models can be adapted for a wide range of downstream applications.
arXiv Detail & Related papers (2024-04-07T17:25:52Z) - On the Out of Distribution Robustness of Foundation Models in Medical
Image Segmentation [47.95611203419802]
Foundations for vision and language, pre-trained on extensive sets of natural image and text data, have emerged as a promising approach.
We compare the generalization performance to unseen domains of various pre-trained models after being fine-tuned on the same in-distribution dataset.
We further developed a new Bayesian uncertainty estimation for frozen models and used them as an indicator to characterize the model's performance on out-of-distribution data.
arXiv Detail & Related papers (2023-11-18T14:52:10Z) - Latent Space Energy-based Model for Fine-grained Open Set Recognition [46.0388856095674]
Fine-grained open-set recognition (FineOSR) aims to recognize images belonging to classes with subtle appearance differences while rejecting images of unknown classes.
As a type of generative model, energy-based models (EBM) are the potential for hybrid modeling of generative and discriminative tasks.
In this paper, we explore the low-dimensional latent space with energy-based prior distribution for OSR in a fine-grained visual world.
arXiv Detail & Related papers (2023-09-19T16:00:09Z) - Machine Learning Small Molecule Properties in Drug Discovery [44.62264781248437]
We review a wide range of properties, including binding affinities, solubility, and ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity)
We discuss existing popular descriptors and embeddings, such as chemical fingerprints and graph-based neural networks.
Finally, techniques to provide an understanding of model predictions, especially for critical decision-making in drug discovery are assessed.
arXiv Detail & Related papers (2023-08-02T22:18:41Z) - GSURE-Based Diffusion Model Training with Corrupted Data [35.56267114494076]
We propose a novel training technique for generative diffusion models based only on corrupted data.
We demonstrate our technique on face images as well as Magnetic Resonance Imaging (MRI)
arXiv Detail & Related papers (2023-05-22T15:27:20Z) - Towards an Automatic Analysis of CHO-K1 Suspension Growth in
Microfluidic Single-cell Cultivation [63.94623495501023]
We propose a novel Machine Learning architecture, which allows us to infuse a neural deep network with human-powered abstraction on the level of data.
Specifically, we train a generative model simultaneously on natural and synthetic data, so that it learns a shared representation, from which a target variable, such as the cell count, can be reliably estimated.
arXiv Detail & Related papers (2020-10-20T08:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.