Autoregressive Sequence Modeling for 3D Medical Image Representation
- URL: http://arxiv.org/abs/2409.08691v1
- Date: Fri, 13 Sep 2024 10:19:10 GMT
- Title: Autoregressive Sequence Modeling for 3D Medical Image Representation
- Authors: Siwen Wang, Churan Wang, Fei Gao, Lixian Su, Fandong Zhang, Yizhou Wang, Yizhou Yu,
- Abstract summary: We introduce a pioneering method for learning 3D medical image representations through an autoregressive sequence pre-training framework.
Our approach various 3D medical images based on spatial, contrast, and semantic correlations, treating them as interconnected visual tokens within a token sequence.
- Score: 48.706230961589924
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Three-dimensional (3D) medical images, such as Computed Tomography (CT) and Magnetic Resonance Imaging (MRI), are essential for clinical applications. However, the need for diverse and comprehensive representations is particularly pronounced when considering the variability across different organs, diagnostic tasks, and imaging modalities. How to effectively interpret the intricate contextual information and extract meaningful insights from these images remains an open challenge to the community. While current self-supervised learning methods have shown potential, they often consider an image as a whole thereby overlooking the extensive, complex relationships among local regions from one or multiple images. In this work, we introduce a pioneering method for learning 3D medical image representations through an autoregressive pre-training framework. Our approach sequences various 3D medical images based on spatial, contrast, and semantic correlations, treating them as interconnected visual tokens within a token sequence. By employing an autoregressive sequence modeling task, we predict the next visual token in the sequence, which allows our model to deeply understand and integrate the contextual information inherent in 3D medical images. Additionally, we implement a random startup strategy to avoid overestimating token relationships and to enhance the robustness of learning. The effectiveness of our approach is demonstrated by the superior performance over others on nine downstream tasks in public datasets.
Related papers
- QUBIQ: Uncertainty Quantification for Biomedical Image Segmentation Challenge [93.61262892578067]
Uncertainty in medical image segmentation tasks, especially inter-rater variability, presents a significant challenge.
This variability directly impacts the development and evaluation of automated segmentation algorithms.
We report the set-up and summarize the benchmark results of the Quantification of Uncertainties in Biomedical Image Quantification Challenge (QUBIQ)
arXiv Detail & Related papers (2024-03-19T17:57:24Z) - Unified Medical Image Pre-training in Language-Guided Common Semantic Space [39.61770813855078]
We propose an Unified Medical Image Pre-training framework, namely UniMedI.
UniMedI uses diagnostic reports as common semantic space to create unified representations for diverse modalities of medical images.
We evaluate its performance on both 2D and 3D images across 10 different datasets.
arXiv Detail & Related papers (2023-11-24T22:01:12Z) - Disruptive Autoencoders: Leveraging Low-level features for 3D Medical
Image Pre-training [51.16994853817024]
This work focuses on designing an effective pre-training framework for 3D radiology images.
We introduce Disruptive Autoencoders, a pre-training framework that attempts to reconstruct the original image from disruptions created by a combination of local masking and low-level perturbations.
The proposed pre-training framework is tested across multiple downstream tasks and achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-07-31T17:59:42Z) - Multi-View Vertebra Localization and Identification from CT Images [57.56509107412658]
We propose a multi-view vertebra localization and identification from CT images.
We convert the 3D problem into a 2D localization and identification task on different views.
Our method can learn the multi-view global information naturally.
arXiv Detail & Related papers (2023-07-24T14:43:07Z) - Graph Self-Supervised Learning for Endoscopic Image Matching [1.8275108630751844]
We propose a novel self-supervised approach that combines Convolutional Neural Networks for capturing local visual appearance and attention-based Graph Neural Networks for modeling spatial relationships between key-points.
Our approach is trained in a fully self-supervised scheme without the need for labeled data.
Our approach outperforms state-of-the-art handcrafted and deep learning-based methods, demonstrating exceptional performance in terms of precision rate (1) and matching score (99.3%)
arXiv Detail & Related papers (2023-06-19T19:53:41Z) - Generative Text-Guided 3D Vision-Language Pretraining for Unified
Medical Image Segmentation [37.93699188912036]
We present Generative Text-Guided 3D Vision-Language Pretraining for Unified Medical Image (GTGM)
GTGM generates medical-style text from 3D medical images without relying on paired descriptions.
Negative-free contrastive learning objective strategy is introduced to cultivate consistent visual representations between augmented 3D medical image patches.
arXiv Detail & Related papers (2023-06-07T22:20:51Z) - Attentive Symmetric Autoencoder for Brain MRI Segmentation [56.02577247523737]
We propose a novel Attentive Symmetric Auto-encoder based on Vision Transformer (ViT) for 3D brain MRI segmentation tasks.
In the pre-training stage, the proposed auto-encoder pays more attention to reconstruct the informative patches according to the gradient metrics.
Experimental results show that our proposed attentive symmetric auto-encoder outperforms the state-of-the-art self-supervised learning methods and medical image segmentation models.
arXiv Detail & Related papers (2022-09-19T09:43:19Z) - Multi-Modal Masked Autoencoders for Medical Vision-and-Language
Pre-Training [62.215025958347105]
We propose a self-supervised learning paradigm with multi-modal masked autoencoders.
We learn cross-modal domain knowledge by reconstructing missing pixels and tokens from randomly masked images and texts.
arXiv Detail & Related papers (2022-09-15T07:26:43Z) - Masked Image Modeling Advances 3D Medical Image Analysis [0.41674286453548476]
Masked image modeling (MIM) has gained considerable attention due to its capacity to learn from vast amounts of unlabeled data.
This paper shows that MIM can also advance 3D medical images analysis in addition to natural images.
arXiv Detail & Related papers (2022-04-25T15:16:08Z) - Imbalance-Aware Self-Supervised Learning for 3D Radiomic Representations [5.750111443935516]
We show how to learn image representations in a self-supervised fashion using a 3D Siamese network.
We show significant improvement in brain tumor classification and lung cancer staging tasks covering MRI and CT imaging modalities.
arXiv Detail & Related papers (2021-03-06T18:17:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.