Autoregressive Medical Image Segmentation via Next-Scale Mask Prediction
- URL: http://arxiv.org/abs/2502.20784v1
- Date: Fri, 28 Feb 2025 07:05:58 GMT
- Title: Autoregressive Medical Image Segmentation via Next-Scale Mask Prediction
- Authors: Tao Chen, Chenhui Wang, Zhihao Chen, Hongming Shan,
- Abstract summary: We propose the AutoRegressive framework via next-scale mask prediction, termed AR-Seg.<n>AR-Seg progressively predicts the next-scale mask by explicitly modeling dependencies across all previous scales.<n>We show that AR-Seg outperforms state-of-the-art methods while explicitly visualizing the intermediate coarse-to-fine segmentation process.
- Score: 16.026171689438637
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: While deep learning has significantly advanced medical image segmentation, most existing methods still struggle with handling complex anatomical regions. Cascaded or deep supervision-based approaches attempt to address this challenge through multi-scale feature learning but fail to establish sufficient inter-scale dependencies, as each scale relies solely on the features of the immediate predecessor. To this end, we propose the AutoRegressive Segmentation framework via next-scale mask prediction, termed AR-Seg, which progressively predicts the next-scale mask by explicitly modeling dependencies across all previous scales within a unified architecture. AR-Seg introduces three innovations: (1) a multi-scale mask autoencoder that quantizes the mask into multi-scale token maps to capture hierarchical anatomical structures, (2) a next-scale autoregressive mechanism that progressively predicts next-scale masks to enable sufficient inter-scale dependencies, and (3) a consensus-aggregation strategy that combines multiple sampled results to generate a more accurate mask, further improving segmentation robustness. Extensive experimental results on two benchmark datasets with different modalities demonstrate that AR-Seg outperforms state-of-the-art methods while explicitly visualizing the intermediate coarse-to-fine segmentation process.
Related papers
- DiffAtlas: GenAI-fying Atlas Segmentation via Image-Mask Diffusion [47.43502676919903]
We introduce DiffAtlas, a novel generative framework that models both images and masks through diffusion during training.
During testing, the model is guided to generate a specific target image-mask pair, from which the corresponding mask is obtained.
Our approach outperforms existing methods, particularly in limited-data and zero-shot modality segmentation.
arXiv Detail & Related papers (2025-03-09T20:06:40Z) - MaskUno: Switch-Split Block For Enhancing Instance Segmentation [0.0]
We propose replacing mask prediction with a Switch-Split block that processes refined ROIs, classifies them, and assigns them to specialized mask predictors.
An increase in the mean Average Precision (mAP) of 2.03% was observed for the high-performing DetectoRS when trained on 80 classes.
arXiv Detail & Related papers (2024-07-31T10:12:14Z) - Generalizable Entity Grounding via Assistance of Large Language Model [77.07759442298666]
We propose a novel approach to densely ground visual entities from a long caption.
We leverage a large multimodal model to extract semantic nouns, a class-a segmentation model to generate entity-level segmentation, and a multi-modal feature fusion module to associate each semantic noun with its corresponding segmentation mask.
arXiv Detail & Related papers (2024-02-04T16:06:05Z) - A Simple Latent Diffusion Approach for Panoptic Segmentation and Mask Inpainting [2.7563282688229664]
This work builds upon Stable Diffusion and proposes a latent diffusion approach for panoptic segmentation.
Our training consists of two steps: (1) training a shallow autoencoder to project the segmentation masks to latent space; (2) training a diffusion model to allow image-conditioned sampling in latent space.
arXiv Detail & Related papers (2024-01-18T18:59:19Z) - Dual-scale Enhanced and Cross-generative Consistency Learning for Semi-supervised Medical Image Segmentation [49.57907601086494]
Medical image segmentation plays a crucial role in computer-aided diagnosis.
We propose a novel Dual-scale Enhanced and Cross-generative consistency learning framework for semi-supervised medical image (DEC-Seg)
arXiv Detail & Related papers (2023-12-26T12:56:31Z) - Morphologically-Aware Consensus Computation via Heuristics-based
IterATive Optimization (MACCHIatO) [1.8749305679160362]
We propose a new method to construct a binary or a probabilistic consensus segmentation based on the Fr'echet means of carefully chosen distances.
We show that it leads to binary consensus masks of intermediate size between Majority Voting and STAPLE and to different posterior probabilities than Mask Averaging and STAPLE methods.
arXiv Detail & Related papers (2023-09-14T23:28:58Z) - Denoising Diffusion Semantic Segmentation with Mask Prior Modeling [61.73352242029671]
We propose to ameliorate the semantic segmentation quality of existing discriminative approaches with a mask prior modeled by a denoising diffusion generative model.
We evaluate the proposed prior modeling with several off-the-shelf segmentors, and our experimental results on ADE20K and Cityscapes demonstrate that our approach could achieve competitively quantitative performance.
arXiv Detail & Related papers (2023-06-02T17:47:01Z) - GD-MAE: Generative Decoder for MAE Pre-training on LiDAR Point Clouds [72.60362979456035]
Masked Autoencoders (MAE) are challenging to explore in large-scale 3D point clouds.
We propose a textbfGenerative textbfDecoder for MAE (GD-MAE) to automatically merges the surrounding context.
We demonstrate the efficacy of the proposed method on several large-scale benchmarks: KITTI, and ONCE.
arXiv Detail & Related papers (2022-12-06T14:32:55Z) - SODAR: Segmenting Objects by DynamicallyAggregating Neighboring Mask
Representations [90.8752454643737]
Recent state-of-the-art one-stage instance segmentation model SOLO divides the input image into a grid and directly predicts per grid cell object masks with fully-convolutional networks.
We observe SOLO generates similar masks for an object at nearby grid cells, and these neighboring predictions can complement each other as some may better segment certain object part.
Motivated by the observed gap, we develop a novel learning-based aggregation method that improves upon SOLO by leveraging the rich neighboring information.
arXiv Detail & Related papers (2022-02-15T13:53:03Z) - A Three-Stage Self-Training Framework for Semi-Supervised Semantic
Segmentation [0.9786690381850356]
We propose a holistic solution framed as a three-stage self-training framework for semantic segmentation.
The key idea of our technique is the extraction of the pseudo-masks statistical information.
We then decrease the uncertainty of the pseudo-masks using a multi-task model that enforces consistency.
arXiv Detail & Related papers (2020-12-01T21:00:27Z) - Multi-Margin based Decorrelation Learning for Heterogeneous Face
Recognition [90.26023388850771]
This paper presents a deep neural network approach to extract decorrelation representations in a hyperspherical space for cross-domain face images.
The proposed framework can be divided into two components: heterogeneous representation network and decorrelation representation learning.
Experimental results on two challenging heterogeneous face databases show that our approach achieves superior performance on both verification and recognition tasks.
arXiv Detail & Related papers (2020-05-25T07:01:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.