Related papers: Efficient Cell Painting Image Representation Learning via Cross-Well Aligned Masked Siamese Network

Efficient Cell Painting Image Representation Learning via Cross-Well Aligned Masked Siamese Network

URL: http://arxiv.org/abs/2509.19896v1
Date: Wed, 24 Sep 2025 08:48:29 GMT
Title: Efficient Cell Painting Image Representation Learning via Cross-Well Aligned Masked Siamese Network
Authors: Pin-Jui Huang, Yu-Hsuan Liao, SooHeon Kim, NoSeong Park, JongBae Park, DongMyung Shin,
Abstract summary: We present Cross-Well Aligned Masked Siamese Network (CWA-MSN), a novel representation learning framework.<n>CWA-MSN aligns embeddings of cells subjected to the same perturbation across different wells, enforcing semantic consistency despite batch effects.<n>In gene-gene relationship retrieval benchmark, CWA-MSN outperforms state-of-the-art publicly available self-supervised (OpenPhenom) and contrastive learning (CellCLIP) methods.
Score: 21.126506900168398
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Computational models that predict cellular phenotypic responses to chemical and genetic perturbations can accelerate drug discovery by prioritizing therapeutic hypotheses and reducing costly wet-lab iteration. However, extracting biologically meaningful and batch-robust cell painting representations remains challenging. Conventional self-supervised and contrastive learning approaches often require a large-scale model and/or a huge amount of carefully curated data, still struggling with batch effects. We present Cross-Well Aligned Masked Siamese Network (CWA-MSN), a novel representation learning framework that aligns embeddings of cells subjected to the same perturbation across different wells, enforcing semantic consistency despite batch effects. Integrated into a masked siamese architecture, this alignment yields features that capture fine-grained morphology while remaining data- and parameter-efficient. For instance, in a gene-gene relationship retrieval benchmark, CWA-MSN outperforms the state-of-the-art publicly available self-supervised (OpenPhenom) and contrastive learning (CellCLIP) methods, improving the benchmark scores by +29\% and +9\%, respectively, while training on substantially fewer data (e.g., 0.2M images for CWA-MSN vs. 2.2M images for OpenPhenom) or smaller model size (e.g., 22M parameters for CWA-MSN vs. 1.48B parameters for CellCLIP). Extensive experiments demonstrate that CWA-MSN is a simple and effective way to learn cell image representation, enabling efficient phenotype modeling even under limited data and parameter budgets.

Related papers

Reducing Domain Gap with Diffusion-Based Domain Adaptation for Cell Counting [1.9766522384767224]
In this work, we adapt the Inversion-Based Style Transfer framework to generate synthetic microscopy images.<n>Our method combines latent-space Adaptive Instance Normalization with inversion in a diffusion model to transfer the style from real microscopy images to synthetic ones.<n>Models trained with our InST-synthesized images achieve up to 37% lower Mean Absolute Error (MAE) compared to models trained on hard-coded synthetic data.
arXiv Detail & Related papers (2025-12-12T18:19:41Z)
MSDM: Generating Task-Specific Pathology Images with a Multimodal Conditioned Diffusion Model for Cell and Nuclei Segmentation [0.3650448386461648]
We introduce a Multimodal Semantic Diffusion Model for generating pixel-precise image-mask pairs for cell and nuclei segmentation.<n>By conditioning the generative process with cellular/nuclear morphologies, MSDM generates datasests with desired morphological properties.<n>We highlight the effectiveness of multimodal diffusion-based augmentation for advancing the robustness and generalizability of cell and nuclei segmentation models.
arXiv Detail & Related papers (2025-10-10T08:23:14Z)
SkinDualGen: Prompt-Driven Diffusion for Simultaneous Image-Mask Generation in Skin Lesions [0.0]
We propose a novel method that leverages the pretrained Stable Diffusion-2.0 model to generate high-quality synthetic skin lesion images.<n>A hybrid dataset combining real and synthetic data markedly enhances the performance of classification and segmentation models.
arXiv Detail & Related papers (2025-07-26T15:00:37Z)
CellCLIP -- Learning Perturbation Effects in Cell Painting via Text-Guided Contrastive Learning [23.521800791670938]
We introduce CellCLIP, a cross-modal contrastive learning framework for HCS data.<n>Our framework outperforms current open-source models, demonstrating the best performance in both cross-modal retrieval and biologically meaningful downstream tasks.
arXiv Detail & Related papers (2025-05-16T23:07:51Z)
scMEDAL for the interpretable analysis of single-cell transcriptomics data with batch effect visualization using a deep mixed effects autoencoder [6.596656267996196]
scMEDAL is a framework for single-cell Mixed Effects Deep Autoencoder Learning.<n> scMEDAL suppresses batch effects while modeling batch-specific variation.<n>It enables more accurate predictions of disease status, donor group, and cell type.
arXiv Detail & Related papers (2024-11-11T00:10:48Z)
Mew: Multiplexed Immunofluorescence Image Analysis through an Efficient Multiplex Network [84.88767228835928]
We introduce Mew, a novel framework designed to efficiently process mIF images through the lens of multiplex network. Mew innovatively constructs a multiplex network comprising two distinct layers: a Voronoi network for geometric information and a Cell-type network for capturing cell-wise homogeneity. This framework equips a scalable and efficient Graph Neural Network (GNN), capable of processing the entire graph during training.
arXiv Detail & Related papers (2024-07-25T08:22:30Z)
DiffBoost: Enhancing Medical Image Segmentation via Text-Guided Diffusion Model [3.890243179348094]
Large-scale, big-variant, high-quality data are crucial for developing robust and successful deep-learning models for medical applications.<n>This paper proposes a novel approach by developing controllable diffusion models for medical image synthesis, called DiffBoost.<n>We leverage recent diffusion probabilistic models to generate realistic and diverse synthetic medical image data.
arXiv Detail & Related papers (2023-10-19T16:18:02Z)
Self-Supervised Neuron Segmentation with Multi-Agent Reinforcement Learning [53.00683059396803]
Mask image model (MIM) has been widely used due to its simplicity and effectiveness in recovering original information from masked images. We propose a decision-based MIM that utilizes reinforcement learning (RL) to automatically search for optimal image masking ratio and masking strategy. Our approach has a significant advantage over alternative self-supervised methods on the task of neuron segmentation.
arXiv Detail & Related papers (2023-10-06T10:40:46Z)
Optimizations of Autoencoders for Analysis and Classification of Microscopic In Situ Hybridization Images [68.8204255655161]
We propose a deep-learning framework to detect and classify areas of microscopic images with similar levels of gene expression. The data we analyze requires an unsupervised learning model for which we employ a type of Artificial Neural Network - Deep Learning Autoencoders.
arXiv Detail & Related papers (2023-04-19T13:45:28Z)
AMIGO: Sparse Multi-Modal Graph Transformer with Shared-Context Processing for Representation Learning of Giga-pixel Images [53.29794593104923]
We present a novel concept of shared-context processing for whole slide histopathology images. AMIGO uses the celluar graph within the tissue to provide a single representation for a patient. We show that our model is strongly robust to missing information to an extent that it can achieve the same performance with as low as 20% of the data.
arXiv Detail & Related papers (2023-03-01T23:37:45Z)
Differentiable Agent-based Epidemiology [71.81552021144589]
We introduce GradABM: a scalable, differentiable design for agent-based modeling that is amenable to gradient-based learning with automatic differentiation. GradABM can quickly simulate million-size populations in few seconds on commodity hardware, integrate with deep neural networks and ingest heterogeneous data sources.
arXiv Detail & Related papers (2022-07-20T07:32:02Z)
Towards an Automatic Analysis of CHO-K1 Suspension Growth in Microfluidic Single-cell Cultivation [63.94623495501023]
We propose a novel Machine Learning architecture, which allows us to infuse a neural deep network with human-powered abstraction on the level of data. Specifically, we train a generative model simultaneously on natural and synthetic data, so that it learns a shared representation, from which a target variable, such as the cell count, can be reliably estimated.
arXiv Detail & Related papers (2020-10-20T08:36:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.