Universal Medical Image Representation Learning with Compositional Decoders
- URL: http://arxiv.org/abs/2409.19890v2
- Date: Mon, 7 Oct 2024 09:35:44 GMT
- Title: Universal Medical Image Representation Learning with Compositional Decoders
- Authors: Kaini Wang, Ling Yang, Siping Zhou, Guangquan Zhou, Wentao Zhang, Bin Cui, Shuo Li,
- Abstract summary: We develop a decomposed-composed universal medical imaging paradigm (UniMed) that supports tasks at all levels.
We propose a decomposed decoder that can predict two types of outputs -- pixel and semantic, based on a defined input queue.
We introduce a composed decoder that unifies the input and output spaces and standardizes task annotations across different levels into a discrete token format.
- Score: 36.36271760800241
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual-language models have advanced the development of universal models, yet their application in medical imaging remains constrained by specific functional requirements and the limited data. Current general-purpose models are typically designed with task-specific branches and heads, which restricts the shared feature space and the flexibility of model. To address these challenges, we have developed a decomposed-composed universal medical imaging paradigm (UniMed) that supports tasks at all levels. To this end, we first propose a decomposed decoder that can predict two types of outputs -- pixel and semantic, based on a defined input queue. Additionally, we introduce a composed decoder that unifies the input and output spaces and standardizes task annotations across different levels into a discrete token format. The coupled design of these two components enables the model to flexibly combine tasks and mutual benefits. Moreover, our joint representation learning strategy skilfully leverages large amounts of unlabeled data and unsupervised loss, achieving efficient one-stage pretraining for more robust performance. Experimental results show that UniMed achieves state-of-the-art performance on eight datasets across all three tasks and exhibits strong zero-shot and 100-shot transferability. We will release the code and trained models upon the paper's acceptance.
Related papers
- KA$^2$ER: Knowledge Adaptive Amalgamation of ExpeRts for Medical Images Segmentation [5.807887214293438]
We propose an adaptive amalgamation knowledge framework that aims to train a versatile foundation model to handle the joint goals of multiple expert models.
In particular, we first train an nnUNet-based expert model for each task, and reuse the pre-trained SwinUNTER as the target foundation model.
Within the hidden layer, the hierarchical attention mechanisms are designed to achieve adaptive merging of the target model to the hidden layer feature knowledge of all experts.
arXiv Detail & Related papers (2024-10-28T14:49:17Z) - Towards a Generalist and Blind RGB-X Tracker [91.36268768952755]
We develop a single model tracker that can remain blind to any modality X during inference time.
Our training process is extremely simple, integrating multi-label classification loss with a routing function.
Our generalist and blind tracker can achieve competitive performance compared to well-established modal-specific models.
arXiv Detail & Related papers (2024-05-28T03:00:58Z) - UniSeg: A Prompt-driven Universal Segmentation Model as well as A Strong
Representation Learner [32.698493660851035]
We propose a prompt-driven universal model (UniSeg) for multi-task medical image segmentation.
We make the model 'aware' of the ongoing task early and boost the task-specific training of the whole decoder.
Our results indicate that the proposed UniSeg outperforms other universal models and single-task models on 11 upstream tasks.
arXiv Detail & Related papers (2023-04-07T06:28:51Z) - Composer: Creative and Controllable Image Synthesis with Composable
Conditions [57.78533372393828]
Recent large-scale generative models learned on big data are capable of synthesizing incredible images yet suffer from limited controllability.
This work offers a new generation paradigm that allows flexible control of the output image, such as spatial layout and palette, while maintaining the synthesis quality and model creativity.
arXiv Detail & Related papers (2023-02-20T05:48:41Z) - Generalized Decoding for Pixel, Image, and Language [197.85760901840177]
We present X-Decoder, a generalized decoding model that can predict pixel-level segmentation and language tokens seamlessly.
X-Decoder is the first work that provides a unified way to support all types of image segmentation and a variety of vision-language (VL) tasks.
arXiv Detail & Related papers (2022-12-21T18:58:41Z) - Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and
Vision-Language Tasks [86.66733026149892]
We propose Uni-Perceiver v2, which is the first generalist model capable of handling major large-scale vision and vision-gnostic tasks.
Specifically, images are encoded as general region proposals, while texts are encoded via a Transformer-based language model.
Uni-Perceiver v2 achieves competitive performance on a broad range of vision and vision-language tasks.
arXiv Detail & Related papers (2022-11-17T18:59:52Z) - Uni-Perceiver: Pre-training Unified Architecture for Generic Perception
for Zero-shot and Few-shot Tasks [73.63892022944198]
We present a generic perception architecture named Uni-Perceiver.
It processes a variety of modalities and tasks with unified modeling and shared parameters.
Results show that our pre-trained model without any tuning can achieve reasonable performance even on novel tasks.
arXiv Detail & Related papers (2021-12-02T18:59:50Z) - HydraSum -- Disentangling Stylistic Features in Text Summarization using
Multi-Decoder Models [12.070474521259776]
We introduce HydraSum, a new summarization architecture that extends the single decoder framework of current models.
Our proposed model encourages each expert, i.e. decoder, to learn and generate stylistically-distinct summaries.
A guided version of the training process can explicitly govern which summary style is partitioned between decoders.
arXiv Detail & Related papers (2021-10-08T22:49:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.