Advancing Volumetric Medical Image Segmentation via Global-Local Masked
Autoencoder
- URL: http://arxiv.org/abs/2306.08913v2
- Date: Wed, 23 Aug 2023 16:07:52 GMT
- Title: Advancing Volumetric Medical Image Segmentation via Global-Local Masked
Autoencoder
- Authors: Jia-Xin Zhuang, Luyang Luo, Hao Chen
- Abstract summary: Masked autoencoder (MAE) is a promising self-supervised pre-training technique.
GL-MAE is a simple yet effective self-supervised pre-training strategy.
- Score: 7.098796546778199
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Masked autoencoder (MAE) is a promising self-supervised pre-training
technique that can improve the representation learning of a neural network
without human intervention. However, applying MAE directly to volumetric
medical images poses two challenges: (i) a lack of global information that is
crucial for understanding the clinical context of the holistic data, (ii) no
guarantee of stabilizing the representations learned from randomly masked
inputs. To address these limitations, we propose the
\textbf{G}lobal-\textbf{L}ocal \textbf{M}asked \textbf{A}uto\textbf{E}ncoder
(GL-MAE), a simple yet effective self-supervised pre-training strategy. In
addition to reconstructing masked local views, as in previous methods, GL-MAE
incorporates global context learning by reconstructing masked global views.
Furthermore, a complete global view is integrated as an anchor to guide the
reconstruction and stabilize the learning process through global-to-global
consistency learning and global-to-local consistency learning. Finetuning
results on multiple datasets demonstrate the superiority of our method over
other state-of-the-art self-supervised algorithms, highlighting its
effectiveness on versatile volumetric medical image segmentation tasks, even
when annotations are scarce. Our codes and models will be released upon
acceptance.
Related papers
- UGMAE: A Unified Framework for Graph Masked Autoencoders [67.75493040186859]
We propose UGMAE, a unified framework for graph masked autoencoders.
We first develop an adaptive feature mask generator to account for the unique significance of nodes.
We then design a ranking-based structure reconstruction objective joint with feature reconstruction to capture holistic graph information.
arXiv Detail & Related papers (2024-02-12T19:39:26Z) - Self-Supervised Neuron Segmentation with Multi-Agent Reinforcement
Learning [53.00683059396803]
Mask image model (MIM) has been widely used due to its simplicity and effectiveness in recovering original information from masked images.
We propose a decision-based MIM that utilizes reinforcement learning (RL) to automatically search for optimal image masking ratio and masking strategy.
Our approach has a significant advantage over alternative self-supervised methods on the task of neuron segmentation.
arXiv Detail & Related papers (2023-10-06T10:40:46Z) - Understanding Masked Autoencoders From a Local Contrastive Perspective [80.57196495601826]
Masked AutoEncoder (MAE) has revolutionized the field of self-supervised learning with its simple yet effective masking and reconstruction strategies.
We introduce a new empirical framework, called Local Contrastive MAE, to analyze both reconstructive and contrastive aspects of MAE.
arXiv Detail & Related papers (2023-10-03T12:08:15Z) - Global and Local Semantic Completion Learning for Vision-Language
Pre-training [34.740507502215536]
Cross-modal alignment plays a crucial role in vision-language pre-training models.
We propose a novel Global and Local Semantic Completion Learning (GLSCL) task to facilitate global-local alignment and local-local alignment simultaneously.
arXiv Detail & Related papers (2023-06-12T13:20:29Z) - MA2CL:Masked Attentive Contrastive Learning for Multi-Agent
Reinforcement Learning [128.19212716007794]
We propose an effective framework called textbfMulti-textbfAgent textbfMasked textbfAttentive textbfContrastive textbfLearning (MA2CL)
MA2CL encourages learning representation to be both temporal and agent-level predictive by reconstructing the masked agent observation in latent space.
Our method significantly improves the performance and sample efficiency of different MARL algorithms and outperforms other methods in various vision-based and state-based scenarios.
arXiv Detail & Related papers (2023-06-03T05:32:19Z) - Multi-Level Global Context Cross Consistency Model for Semi-Supervised
Ultrasound Image Segmentation with Diffusion Model [0.0]
We propose a framework that uses images generated by a Latent Diffusion Model (LDM) as unlabeled images for semi-supervised learning.
Our approach enables the effective transfer of probability distribution knowledge to the segmentation network, resulting in improved segmentation accuracy.
arXiv Detail & Related papers (2023-05-16T14:08:24Z) - USER: Unified Semantic Enhancement with Momentum Contrast for Image-Text
Retrieval [115.28586222748478]
Image-Text Retrieval (ITR) aims at searching for the target instances that are semantically relevant to the given query from the other modality.
Existing approaches typically suffer from two major limitations.
arXiv Detail & Related papers (2023-01-17T12:42:58Z) - Seeing What You Miss: Vision-Language Pre-training with Semantic
Completion Learning [22.464424641734652]
Cross-modal alignment is essential for vision-language pre-training models.
We propose a novel Semantic Completion Learning task to facilitate global-to-local alignment.
We also present a flexible vision encoder, which enables our model to perform image-text and video-text multimodal tasks simultaneously.
arXiv Detail & Related papers (2022-11-24T06:39:16Z) - Is Attention Better Than Matrix Decomposition? [58.813382406412195]
We show that self-attention is not better than the matrix decomposition model for encoding long-distance dependencies.
We propose a series of Hamburgers, in which we employ the optimization algorithms for solving MDs to factorize the input representations into sub-matrices and reconstruct a low-rank embedding.
Comprehensive experiments are conducted in the vision tasks where it is crucial to learn the global context.
arXiv Detail & Related papers (2021-09-09T20:40:19Z) - Momentum Contrastive Voxel-wise Representation Learning for
Semi-supervised Volumetric Medical Image Segmentation [2.3322477552758234]
We present a novel Contrastive Voxel-wise Representation (CVRL) method with geometric constraints to learn global-local visual representations for medical image segmentation.
Our framework can effectively learn global and local features by capturing 3D spatial context and rich anatomical information.
arXiv Detail & Related papers (2021-05-14T20:27:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.