CHAMMI-75: pre-training multi-channel models with heterogeneous microscopy images
- URL: http://arxiv.org/abs/2512.20833v1
- Date: Tue, 23 Dec 2025 23:15:10 GMT
- Title: CHAMMI-75: pre-training multi-channel models with heterogeneous microscopy images
- Authors: Vidit Agrawal, John Peters, Tyler N. Thompson, Mohammad Vali Sanian, Chau Pham, Nikita Moshkov, Arshad Kazi, Aditya Pillai, Jack Freeman, Byunguk Kang, Samouil L. Farhi, Ernest Fraenkel, Ron Stewart, Lassi Paavolainen, Bryan A. Plummer, Juan C. Caicedo,
- Abstract summary: We present CHAMMI-75, an open access dataset of heterogeneous, multi-channel microscopy images from 75 diverse biological studies.<n>Our experiments show that training with CHAMMI-75 can improve performance in multi-channel bioimaging tasks.
- Score: 17.19661386207827
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Quantifying cell morphology using images and machine learning has proven to be a powerful tool to study the response of cells to treatments. However, models used to quantify cellular morphology are typically trained with a single microscopy imaging type. This results in specialized models that cannot be reused across biological studies because the technical specifications do not match (e.g., different number of channels), or because the target experimental conditions are out of distribution. Here, we present CHAMMI-75, an open access dataset of heterogeneous, multi-channel microscopy images from 75 diverse biological studies. We curated this resource from publicly available sources to investigate cellular morphology models that are channel-adaptive and can process any microscopy image type. Our experiments show that training with CHAMMI-75 can improve performance in multi-channel bioimaging tasks primarily because of its high diversity in microscopy modalities. This work paves the way to create the next generation of cellular morphology models for biological studies.
Related papers
- MorphGen: Controllable and Morphologically Plausible Generative Cell-Imaging [31.990445585569688]
MorphGen is a state-of-the-art diffusion-based generative model for fluorescent microscopy.<n>It generates the complete set of fluorescent channels jointly, preserving per-organelle structures.<n>MorphGen attains an FID score over 35% lower than the prior state-of-the-art MorphoDiff.
arXiv Detail & Related papers (2025-10-01T13:34:29Z) - PixCell: A generative foundation model for digital histopathology images [49.00921097924924]
We introduce PixCell, the first diffusion-based generative foundation model for histopathology.<n>We train PixCell on PanCan-30M, a vast, diverse dataset derived from 69,184 H&E-stained whole slide images covering various cancer types.
arXiv Detail & Related papers (2025-06-05T15:14:32Z) - ViTally Consistent: Scaling Biological Representation Learning for Cell Microscopy [3.432992120614117]
We present the largest foundation model for cell microscopy data to date.<n>Compared to a previous published ViT-L/8 MAE, our new model achieves a 60% improvement in linear separability of genetic perturbations.
arXiv Detail & Related papers (2024-11-04T20:09:51Z) - μ-Bench: A Vision-Language Benchmark for Microscopy Understanding [43.27182445778988]
Vision-language models (VLMs) offer a promising solution for large-scale biological image analysis.
There is a lack of standardized, diverse, and large-scale vision-language benchmarks to evaluate VLMs.
mu-Bench is an expert-curated benchmark encompassing 22 biomedical tasks.
arXiv Detail & Related papers (2024-07-01T20:30:26Z) - Masked Autoencoders for Microscopy are Scalable Learners of Cellular Biology [2.7280901660033643]
This work explores the scaling properties of weakly supervised classifiers and self-supervised masked autoencoders (MAEs)
Our results show that ViT-based MAEs outperform weakly supervised classifiers on a variety of tasks, achieving as much as a 11.5% relative improvement when recalling known biological relationships curated from public databases.
We develop a new channel-agnostic MAE architecture (CA-MAE) that allows for inputting images of different numbers and orders of channels at inference time.
arXiv Detail & Related papers (2024-04-16T02:42:06Z) - Practical Guidelines for Cell Segmentation Models Under Optical Aberrations in Microscopy [14.042884268397058]
This study evaluates cell image segmentation models under optical aberrations from fluorescence and bright field microscopy.
We train and test several segmentation models, including the Otsu threshold method and Mask R-CNN with different network heads.
In contrast, Cellpose 2.0 proves effective for complex cell images under similar conditions.
arXiv Detail & Related papers (2024-04-12T15:45:26Z) - Mixed Models with Multiple Instance Learning [51.440557223100164]
We introduce MixMIL, a framework integrating Generalized Linear Mixed Models (GLMM) and Multiple Instance Learning (MIL)
Our empirical results reveal that MixMIL outperforms existing MIL models in single-cell datasets.
arXiv Detail & Related papers (2023-11-04T16:42:42Z) - Multi-stream Cell Segmentation with Low-level Cues for Multi-modality
Images [66.79688768141814]
We develop an automatic cell classification pipeline to label microscopy images.
We then train a classification model based on the category labels.
We deploy two types of segmentation models to segment cells with roundish and irregular shapes.
arXiv Detail & Related papers (2023-10-22T08:11:08Z) - Affine-Consistent Transformer for Multi-Class Cell Nuclei Detection [76.11864242047074]
We propose a novel Affine-Consistent Transformer (AC-Former), which directly yields a sequence of nucleus positions.
We introduce an Adaptive Affine Transformer (AAT) module, which can automatically learn the key spatial transformations to warp original images for local network training.
Experimental results demonstrate that the proposed method significantly outperforms existing state-of-the-art algorithms on various benchmarks.
arXiv Detail & Related papers (2023-10-22T02:27:02Z) - Style transfer between Microscopy and Magnetic Resonance Imaging via Generative Adversarial Network in small sample size settings [45.62331048595689]
Cross-modal augmentation of Magnetic Resonance Imaging (MRI) and microscopic imaging based on the same tissue samples is promising.<n>We tested a method for generating microscopic histological images from MRI scans of the human corpus callosum using conditional generative adversarial network (cGAN) architecture.
arXiv Detail & Related papers (2023-10-16T13:58:53Z) - BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs [46.87322157229728]
We present PMC-15M, a novel dataset that is two orders of magnitude larger than existing biomedical multimodal datasets.<n> PMC-15M contains 15 million biomedical image-text pairs collected from 4.4 million scientific articles.<n>Based on PMC-15M, we have pretrained BiomedCLIP, a multimodal foundation model, with domain-specific adaptations tailored to biomedical vision-language processing.
arXiv Detail & Related papers (2023-03-02T02:20:04Z) - Unpaired Image-to-Image Translation with Limited Data to Reveal Subtle
Phenotypes [0.5076419064097732]
We present an improved CycleGAN architecture that employs self-supervised discriminators to alleviate the need for numerous images.
We also provide results obtained with small biological datasets on obvious and non-obvious cell phenotype variations.
arXiv Detail & Related papers (2023-01-21T16:25:04Z) - Learning multi-scale functional representations of proteins from
single-cell microscopy data [77.34726150561087]
We show that simple convolutional networks trained on localization classification can learn protein representations that encapsulate diverse functional information.
We also propose a robust evaluation strategy to assess quality of protein representations across different scales of biological function.
arXiv Detail & Related papers (2022-05-24T00:00:07Z) - Towards an Automatic Analysis of CHO-K1 Suspension Growth in
Microfluidic Single-cell Cultivation [63.94623495501023]
We propose a novel Machine Learning architecture, which allows us to infuse a neural deep network with human-powered abstraction on the level of data.
Specifically, we train a generative model simultaneously on natural and synthetic data, so that it learns a shared representation, from which a target variable, such as the cell count, can be reliably estimated.
arXiv Detail & Related papers (2020-10-20T08:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.