CellVTA: Enhancing Vision Foundation Models for Accurate Cell Segmentation and Classification
- URL: http://arxiv.org/abs/2504.00784v1
- Date: Tue, 01 Apr 2025 13:36:46 GMT
- Title: CellVTA: Enhancing Vision Foundation Models for Accurate Cell Segmentation and Classification
- Authors: Yang Yang, Xijie Xu, Yixun Zhou, Jie Zheng,
- Abstract summary: We propose CellVTA (Cell Vision Transformer with Adapter), a novel method that improves the performance of vision foundation models for cell instance segmentation.<n>Our method preserves the core architecture of ViT, ensuring seamless integration with pretrained foundation models.<n>Our experiments show that CellVTA achieves 0.538 mPQ on the CoNIC dataset and 0.506 mPQ on the PanNuke dataset.
- Score: 4.307361438627596
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cell instance segmentation is a fundamental task in digital pathology with broad clinical applications. Recently, vision foundation models, which are predominantly based on Vision Transformers (ViTs), have achieved remarkable success in pathology image analysis. However, their improvements in cell instance segmentation remain limited. A key challenge arises from the tokenization process in ViTs, which substantially reduces the spatial resolution of input images, leading to suboptimal segmentation quality, especially for small and densely packed cells. To address this problem, we propose CellVTA (Cell Vision Transformer with Adapter), a novel method that improves the performance of vision foundation models for cell instance segmentation by incorporating a CNN-based adapter module. This adapter extracts high-resolution spatial information from input images and injects it into the ViT through a cross-attention mechanism. Our method preserves the core architecture of ViT, ensuring seamless integration with pretrained foundation models. Extensive experiments show that CellVTA achieves 0.538 mPQ on the CoNIC dataset and 0.506 mPQ on the PanNuke dataset, which significantly outperforms the state-of-the-art cell segmentation methods. Ablation studies confirm the superiority of our approach over other fine-tuning strategies, including decoder-only fine-tuning and full fine-tuning. Our code and models are publicly available at https://github.com/JieZheng-ShanghaiTech/CellVTA.
Related papers
- CellSeg1: Robust Cell Segmentation with One Training Image [37.60000299559688]
We introduce CellSeg1, a solution for segmenting cells of arbitrary morphology and modality with a few dozen cell annotations in 1 image.<n>Tested on 19 diverse cell datasets, CellSeg1 trained on 1 image achieved 0.81 average mAP at 0.5 IoU, performing comparably to existing models trained on over 500 images.
arXiv Detail & Related papers (2024-12-02T11:55:22Z) - Practical Guidelines for Cell Segmentation Models Under Optical Aberrations in Microscopy [14.042884268397058]
This study evaluates cell image segmentation models under optical aberrations from fluorescence and bright field microscopy.
We train and test several segmentation models, including the Otsu threshold method and Mask R-CNN with different network heads.
In contrast, Cellpose 2.0 proves effective for complex cell images under similar conditions.
arXiv Detail & Related papers (2024-04-12T15:45:26Z) - Group Multi-View Transformer for 3D Shape Analysis with Spatial Encoding [81.1943823985213]
In recent years, the results of view-based 3D shape recognition methods have saturated, and models with excellent performance cannot be deployed on memory-limited devices.
We introduce a compression method based on knowledge distillation for this field, which largely reduces the number of parameters while preserving model performance as much as possible.
Specifically, to enhance the capabilities of smaller models, we design a high-performing large model called Group Multi-view Vision Transformer (GMViT)
The large model GMViT achieves excellent 3D classification and retrieval results on the benchmark datasets ModelNet, ShapeNetCore55, and MCB.
arXiv Detail & Related papers (2023-12-27T08:52:41Z) - Multi-stream Cell Segmentation with Low-level Cues for Multi-modality
Images [66.79688768141814]
We develop an automatic cell classification pipeline to label microscopy images.
We then train a classification model based on the category labels.
We deploy two types of segmentation models to segment cells with roundish and irregular shapes.
arXiv Detail & Related papers (2023-10-22T08:11:08Z) - Affine-Consistent Transformer for Multi-Class Cell Nuclei Detection [76.11864242047074]
We propose a novel Affine-Consistent Transformer (AC-Former), which directly yields a sequence of nucleus positions.
We introduce an Adaptive Affine Transformer (AAT) module, which can automatically learn the key spatial transformations to warp original images for local network training.
Experimental results demonstrate that the proposed method significantly outperforms existing state-of-the-art algorithms on various benchmarks.
arXiv Detail & Related papers (2023-10-22T02:27:02Z) - CellViT: Vision Transformers for Precise Cell Segmentation and
Classification [3.6000652088960785]
We introduce a new method for automated instance segmentation of cell nuclei in digitized tissue samples using a deep learning architecture based on Vision Transformer called CellViT.
We demonstrate the superiority of large-scale in-domain and out-of-domain pre-trained Vision Transformers by leveraging the recently published Segment Anything Model and a ViT-encoder pre-trained on 104 million histological image patches.
arXiv Detail & Related papers (2023-06-27T10:03:15Z) - AMIGO: Sparse Multi-Modal Graph Transformer with Shared-Context
Processing for Representation Learning of Giga-pixel Images [53.29794593104923]
We present a novel concept of shared-context processing for whole slide histopathology images.
AMIGO uses the celluar graph within the tissue to provide a single representation for a patient.
We show that our model is strongly robust to missing information to an extent that it can achieve the same performance with as low as 20% of the data.
arXiv Detail & Related papers (2023-03-01T23:37:45Z) - Global Context Vision Transformers [78.5346173956383]
We propose global context vision transformer (GC ViT), a novel architecture that enhances parameter and compute utilization for computer vision.
We address the lack of the inductive bias in ViTs, and propose to leverage a modified fused inverted residual blocks in our architecture.
Our proposed GC ViT achieves state-of-the-art results across image classification, object detection and semantic segmentation tasks.
arXiv Detail & Related papers (2022-06-20T18:42:44Z) - EfficientCellSeg: Efficient Volumetric Cell Segmentation Using Context
Aware Pseudocoloring [4.555723508665994]
We introduce a small convolutional neural network (CNN) for volumetric cell segmentation.
Our model is efficient and has an asymmetric encoder-decoder structure with very few parameters in the decoder.
Our method achieves top-ranking results, while our CNN model has an up to 25x lower number of parameters than other top-ranking methods.
arXiv Detail & Related papers (2022-04-06T18:02:15Z) - Enforcing Morphological Information in Fully Convolutional Networks to
Improve Cell Instance Segmentation in Fluorescence Microscopy Images [1.408123603417833]
We propose a novel cell instance segmentation approach based on the well-known U-Net architecture.
To enforce the learning of morphological information per pixel, a deep distance transformer (DDT) acts as a back-bone model.
The obtained results suggest a performance boost over traditional U-Net architectures.
arXiv Detail & Related papers (2021-06-10T15:54:38Z) - Split and Expand: An inference-time improvement for Weakly Supervised
Cell Instance Segmentation [71.50526869670716]
We propose a two-step post-processing procedure, Split and Expand, to improve the conversion of segmentation maps to instances.
In the Split step, we split clumps of cells from the segmentation map into individual cell instances with the guidance of cell-center predictions.
In the Expand step, we find missing small cells using the cell-center predictions.
arXiv Detail & Related papers (2020-07-21T14:05:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.