UniSeg: A Prompt-driven Universal Segmentation Model as well as A Strong
Representation Learner
- URL: http://arxiv.org/abs/2304.03493v1
- Date: Fri, 7 Apr 2023 06:28:51 GMT
- Title: UniSeg: A Prompt-driven Universal Segmentation Model as well as A Strong
Representation Learner
- Authors: Yiwen Ye, Yutong Xie, Jianpeng Zhang, Ziyang Chen, Yong Xia
- Abstract summary: We propose a prompt-driven universal model (UniSeg) for multi-task medical image segmentation.
We make the model 'aware' of the ongoing task early and boost the task-specific training of the whole decoder.
Our results indicate that the proposed UniSeg outperforms other universal models and single-task models on 11 upstream tasks.
- Score: 32.698493660851035
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The universal model emerges as a promising trend for medical image
segmentation, paving up the way to build medical imaging large model (MILM).
One popular strategy to build universal models is to encode each task as a
one-hot vector and generate dynamic convolutional layers at the end of the
decoder to extract the interested target. Although successful, it ignores the
correlations among tasks and meanwhile is too late to make the model 'aware' of
the ongoing task. To address both issues, we propose a prompt-driven Universal
Segmentation model (UniSeg) for multi-task medical image segmentation using
diverse modalities and domains. We first devise a learnable universal prompt to
describe the correlations among all tasks and then convert this prompt and
image features into a task-specific prompt, which is fed to the decoder as a
part of its input. Thus, we make the model 'aware' of the ongoing task early
and boost the task-specific training of the whole decoder. Our results indicate
that the proposed UniSeg outperforms other universal models and single-task
models on 11 upstream tasks. Moreover, UniSeg also beats other pre-trained
models on two downstream datasets, providing the community with a high-quality
pre-trained model for 3D medical image segmentation. Code and model are
available at https://github.com/yeerwen/UniSeg.
Related papers
- KA$^2$ER: Knowledge Adaptive Amalgamation of ExpeRts for Medical Images Segmentation [5.807887214293438]
We propose an adaptive amalgamation knowledge framework that aims to train a versatile foundation model to handle the joint goals of multiple expert models.
In particular, we first train an nnUNet-based expert model for each task, and reuse the pre-trained SwinUNTER as the target foundation model.
Within the hidden layer, the hierarchical attention mechanisms are designed to achieve adaptive merging of the target model to the hidden layer feature knowledge of all experts.
arXiv Detail & Related papers (2024-10-28T14:49:17Z) - MedUniSeg: 2D and 3D Medical Image Segmentation via a Prompt-driven Universal Model [27.58715707047272]
We introduce MedUniSeg, a prompt-driven universal segmentation model for 2D and 3D multi-task segmentation.
MedUniSeg employs multiple modal-specific prompts alongside a universal task prompt to accurately characterize the modalities and tasks.
We evaluate MedUniSeg on a comprehensive multi-modal upstream dataset consisting of 17 sub-datasets.
arXiv Detail & Related papers (2024-10-08T11:04:01Z) - Universal Medical Image Representation Learning with Compositional Decoders [36.36271760800241]
We develop a decomposed-composed universal medical imaging paradigm (UniMed) that supports tasks at all levels.
We propose a decomposed decoder that can predict two types of outputs -- pixel and semantic, based on a defined input queue.
We introduce a composed decoder that unifies the input and output spaces and standardizes task annotations across different levels into a discrete token format.
arXiv Detail & Related papers (2024-09-30T02:39:42Z) - Comprehensive Generative Replay for Task-Incremental Segmentation with Concurrent Appearance and Semantic Forgetting [49.87694319431288]
Generalist segmentation models are increasingly favored for diverse tasks involving various objects from different image sources.
We propose a Comprehensive Generative (CGR) framework that restores appearance and semantic knowledge by synthesizing image-mask pairs.
Experiments on incremental tasks (cardiac, fundus and prostate segmentation) show its clear advantage for alleviating concurrent appearance and semantic forgetting.
arXiv Detail & Related papers (2024-06-28T10:05:58Z) - Rethinking Interactive Image Segmentation with Low Latency, High Quality, and Diverse Prompts [68.86537322287474]
Low-latency and high-quality interactive segmentation with diverse prompts are challenging for specialist and generalist models.
We propose SegNext, a next-generation interactive segmentation approach offering low latency, high quality, and diverse prompt support.
Our method outperforms current state-of-the-art methods on HQSeg-44K and DAVIS, both quantitatively and qualitatively.
arXiv Detail & Related papers (2024-03-31T17:02:24Z) - OMG-Seg: Is One Model Good Enough For All Segmentation? [83.17068644513144]
OMG-Seg is a transformer-based encoder-decoder architecture with task-specific queries and outputs.
We show that OMG-Seg can support over ten distinct segmentation tasks and yet significantly reduce computational and parameter overhead.
arXiv Detail & Related papers (2024-01-18T18:59:34Z) - UnIVAL: Unified Model for Image, Video, Audio and Language Tasks [105.77733287326308]
UnIVAL model goes beyond two modalities and unifies text, images, video, and audio into a single model.
Our model is efficiently pretrained on many tasks, based on task balancing and multimodal curriculum learning.
Thanks to the unified model, we propose a novel study on multimodal model merging via weight generalization.
arXiv Detail & Related papers (2023-07-30T09:48:36Z) - An Efficient General-Purpose Modular Vision Model via Multi-Task
Heterogeneous Training [79.78201886156513]
We present a model that can perform multiple vision tasks and can be adapted to other downstream tasks efficiently.
Our approach achieves comparable results to single-task state-of-the-art models and demonstrates strong generalization on downstream tasks.
arXiv Detail & Related papers (2023-06-29T17:59:57Z) - Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and
Vision-Language Tasks [86.66733026149892]
We propose Uni-Perceiver v2, which is the first generalist model capable of handling major large-scale vision and vision-gnostic tasks.
Specifically, images are encoded as general region proposals, while texts are encoded via a Transformer-based language model.
Uni-Perceiver v2 achieves competitive performance on a broad range of vision and vision-language tasks.
arXiv Detail & Related papers (2022-11-17T18:59:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.