Related papers: MedUniSeg: 2D and 3D Medical Image Segmentation via a Prompt-driven Universal Model

MedUniSeg: 2D and 3D Medical Image Segmentation via a Prompt-driven Universal Model

URL: http://arxiv.org/abs/2410.05905v1
Date: Tue, 8 Oct 2024 11:04:01 GMT
Title: MedUniSeg: 2D and 3D Medical Image Segmentation via a Prompt-driven Universal Model
Authors: Yiwen Ye, Ziyang Chen, Jianpeng Zhang, Yutong Xie, Yong Xia,
Abstract summary: We introduce MedUniSeg, a prompt-driven universal segmentation model for 2D and 3D multi-task segmentation. MedUniSeg employs multiple modal-specific prompts alongside a universal task prompt to accurately characterize the modalities and tasks. We evaluate MedUniSeg on a comprehensive multi-modal upstream dataset consisting of 17 sub-datasets.
Score: 27.58715707047272
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Universal segmentation models offer significant potential in addressing a wide range of tasks by effectively leveraging discrete annotations. As the scope of tasks and modalities expands, it becomes increasingly important to generate and strategically position task- and modal-specific priors within the universal model. However, existing universal models often overlook the correlations between different priors, and the optimal placement and frequency of these priors remain underexplored. In this paper, we introduce MedUniSeg, a prompt-driven universal segmentation model designed for 2D and 3D multi-task segmentation across diverse modalities and domains. MedUniSeg employs multiple modal-specific prompts alongside a universal task prompt to accurately characterize the modalities and tasks. To generate the related priors, we propose the modal map (MMap) and the fusion and selection (FUSE) modules, which transform modal and task prompts into corresponding priors. These modal and task priors are systematically introduced at the start and end of the encoding process. We evaluate MedUniSeg on a comprehensive multi-modal upstream dataset consisting of 17 sub-datasets. The results demonstrate that MedUniSeg achieves superior multi-task segmentation performance, attaining a 1.2% improvement in the mean Dice score across the 17 upstream tasks compared to nnUNet baselines, while using less than 1/10 of the parameters. For tasks that underperform during the initial multi-task joint training, we freeze MedUniSeg and introduce new modules to re-learn these tasks. This approach yields an enhanced version, MedUniSeg*, which consistently outperforms MedUniSeg across all tasks. Moreover, MedUniSeg surpasses advanced self-supervised and supervised pre-trained models on six downstream tasks, establishing itself as a high-quality, highly generalizable pre-trained segmentation model.

Related papers

OFA-MAS: One-for-All Multi-Agent System Topology Design based on Mixture-of-Experts Graph Generative Models [57.94189874119267]
Multi-Agent Systems (MAS) offer a powerful paradigm for solving complex problems.<n>Current graph learning-based design methodologies often adhere to a "one-for-one" paradigm.<n>We propose OFA-TAD, a one-for-all framework that generates adaptive collaboration graphs for any task described in natural language.
arXiv Detail & Related papers (2026-01-19T12:23:44Z)
MedSeqFT: Sequential Fine-tuning Foundation Models for 3D Medical Image Segmentation [55.37355146924576]
MedSeqFT is a sequential fine-tuning framework for medical image analysis.<n>It adapts pre-trained models to new tasks while refining their representational capacity.<n>It consistently outperforms state-of-the-art fine-tuning strategies.
arXiv Detail & Related papers (2025-09-07T15:22:53Z)
MINT: Multimodal Instruction Tuning with Multimodal Interaction Grouping [28.653290360671175]
We introduce MINT, a simple yet surprisingly effective task-grouping strategy based on the type of multimodal interaction.<n>We demonstrate that the proposed method greatly outperforms existing task grouping baselines for multimodal instruction tuning.
arXiv Detail & Related papers (2025-06-02T22:55:23Z)
From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons [85.99268361356832]
We introduce a process of adapting an MLLM to a Generalist Embodied Agent (GEA) GEA is a single unified model capable of grounding itself across varied domains through a multi-embodiment action tokenizer. Our findings reveal the importance of training with cross-domain data and online RL for building generalist agents.
arXiv Detail & Related papers (2024-12-11T15:06:25Z)
DriveMM: All-in-One Large Multimodal Model for Autonomous Driving [63.882827922267666]
DriveMM is a large multimodal model designed to process diverse data inputs, such as images and multi-view videos, while performing a broad spectrum of autonomous driving tasks. We conduct evaluations on six public benchmarks and undertake zero-shot transfer on an unseen dataset, where DriveMM achieves state-of-the-art performance across all tasks.
arXiv Detail & Related papers (2024-12-10T17:27:32Z)
Repurposing Foundation Model for Generalizable Medical Time Series Classification [16.21546283978257]
FORMED is a framework for repurposing a backbone foundation model to enable highly generalizable MedTS classification on unseen datasets.<n>We evaluate FORMED on 5 diverse MedTS datasets, benchmarking against 11 Task-Specific Models (TSM) and 4 Task-Specific Adaptation (TSA) methods.<n>Our results demonstrate FORMED's dominant performance, achieving up to 35% absolute improvement in F1-score (on ADFTD dataset) over specialized baselines.
arXiv Detail & Related papers (2024-10-03T23:50:04Z)
One Framework to Rule Them All: Unifying Multimodal Tasks with LLM Neural-Tuning [16.96824902454355]
We propose a unified framework that concurrently handles multiple tasks and modalities. In this framework, all modalities and tasks are represented as unified tokens and trained using a single, consistent approach. We present a new benchmark, MMUD, which includes samples annotated with multiple task labels. We demonstrate the ability to handle multiple tasks simultaneously in a streamlined and efficient manner.
arXiv Detail & Related papers (2024-08-06T07:19:51Z)
Towards a Generalist and Blind RGB-X Tracker [91.36268768952755]
We develop a single model tracker that can remain blind to any modality X during inference time. Our training process is extremely simple, integrating multi-label classification loss with a routing function. Our generalist and blind tracker can achieve competitive performance compared to well-established modal-specific models.
arXiv Detail & Related papers (2024-05-28T03:00:58Z)
Enhancing Visual Grounding and Generalization: A Multi-Task Cycle Training Approach for Vision-Language Models [41.64717254672843]
Visual grounding occupies a pivotal position in multi-modality vision-language models. We propose ViLaM, a large multi-modality model, that supports multi-tasks of VG. ViLaM extends a wide range of instructions, thereby significantly enhancing its generalization and interaction potentials.
arXiv Detail & Related papers (2023-11-21T03:40:09Z)
AdaMerging: Adaptive Model Merging for Multi-Task Learning [68.75885518081357]
This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging) It aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data. Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11% improvement in performance.
arXiv Detail & Related papers (2023-10-04T04:26:33Z)
An Efficient General-Purpose Modular Vision Model via Multi-Task Heterogeneous Training [79.78201886156513]
We present a model that can perform multiple vision tasks and can be adapted to other downstream tasks efficiently. Our approach achieves comparable results to single-task state-of-the-art models and demonstrates strong generalization on downstream tasks.
arXiv Detail & Related papers (2023-06-29T17:59:57Z)
UniSeg: A Prompt-driven Universal Segmentation Model as well as A Strong Representation Learner [32.698493660851035]
We propose a prompt-driven universal model (UniSeg) for multi-task medical image segmentation. We make the model 'aware' of the ongoing task early and boost the task-specific training of the whole decoder. Our results indicate that the proposed UniSeg outperforms other universal models and single-task models on 11 upstream tasks.
arXiv Detail & Related papers (2023-04-07T06:28:51Z)
OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models [72.8156832931841]
Generalist models are capable of performing diverse multi-modal tasks in a task-agnostic way within a single model. We release a generalist model learning system, OFASys, built on top of a declarative task interface named multi-modal instruction.
arXiv Detail & Related papers (2022-12-08T17:07:09Z)
Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks [86.66733026149892]
We propose Uni-Perceiver v2, which is the first generalist model capable of handling major large-scale vision and vision-gnostic tasks. Specifically, images are encoded as general region proposals, while texts are encoded via a Transformer-based language model. Uni-Perceiver v2 achieves competitive performance on a broad range of vision and vision-language tasks.
arXiv Detail & Related papers (2022-11-17T18:59:52Z)
Deep Multimodal Fusion for Generalizable Person Re-identification [15.250738959921872]
DMF is a Deep Multimodal Fusion network for the general scenarios on person re-identification task. Rich semantic knowledge is introduced to assist in feature representation learning during the pre-training stage. A realistic dataset is adopted to fine-tine the pre-trained model for distribution alignment with real-world.
arXiv Detail & Related papers (2022-11-02T07:42:48Z)
Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks [73.63892022944198]
We present a generic perception architecture named Uni-Perceiver. It processes a variety of modalities and tasks with unified modeling and shared parameters. Results show that our pre-trained model without any tuning can achieve reasonable performance even on novel tasks.
arXiv Detail & Related papers (2021-12-02T18:59:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.