FreMIM: Fourier Transform Meets Masked Image Modeling for Medical Image
Segmentation
- URL: http://arxiv.org/abs/2304.10864v3
- Date: Thu, 30 Nov 2023 07:07:48 GMT
- Title: FreMIM: Fourier Transform Meets Masked Image Modeling for Medical Image
Segmentation
- Authors: Wenxuan Wang, Jing Wang, Chen Chen, Jianbo Jiao, Yuanxiu Cai, Shanshan
Song, Jiangyun Li
- Abstract summary: We present a new MIM-based framework named FreMIM for self-supervised pre-training to better accomplish medical image segmentation tasks.
Our FreMIM could consistently bring considerable improvements to model performance.
- Score: 37.465246717967595
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The research community has witnessed the powerful potential of
self-supervised Masked Image Modeling (MIM), which enables the models capable
of learning visual representation from unlabeled data. In this paper, to
incorporate both the crucial global structural information and local details
for dense prediction tasks, we alter the perspective to the frequency domain
and present a new MIM-based framework named FreMIM for self-supervised
pre-training to better accomplish medical image segmentation tasks. Based on
the observations that the detailed structural information mainly lies in the
high-frequency components and the high-level semantics are abundant in the
low-frequency counterparts, we further incorporate multi-stage supervision to
guide the representation learning during the pre-training phase. Extensive
experiments on three benchmark datasets show the superior advantage of our
FreMIM over previous state-of-the-art MIM methods. Compared with various
baselines trained from scratch, our FreMIM could consistently bring
considerable improvements to model performance. The code will be publicly
available at https://github.com/Rubics-Xuan/FreMIM.
Related papers
- PMT: Progressive Mean Teacher via Exploring Temporal Consistency for Semi-Supervised Medical Image Segmentation [51.509573838103854]
We propose a semi-supervised learning framework, termed Progressive Mean Teachers (PMT), for medical image segmentation.
Our PMT generates high-fidelity pseudo labels by learning robust and diverse features in the training process.
Experimental results on two datasets with different modalities, i.e., CT and MRI, demonstrate that our method outperforms the state-of-the-art medical image segmentation approaches.
arXiv Detail & Related papers (2024-09-08T15:02:25Z) - Masked Image Modeling: A Survey [73.21154550957898]
Masked image modeling emerged as a powerful self-supervised learning technique in computer vision.
We construct a taxonomy and review the most prominent papers in recent years.
We aggregate the performance results of various masked image modeling methods on the most popular datasets.
arXiv Detail & Related papers (2024-08-13T07:27:02Z) - MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training [103.72844619581811]
We build performant Multimodal Large Language Models (MLLMs)
In particular, we study the importance of various architecture components and data choices.
We demonstrate that for large-scale multimodal pre-training using a careful mix of image-caption, interleaved image-text, and text-only data.
arXiv Detail & Related papers (2024-03-14T17:51:32Z) - MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Representations [16.885965702357314]
MIM-Refiner is a contrastive learning boost for pre-trained MIM models.
We refine the features of MIM models from subpar to state-of-the-art, off-the-shelf features.
arXiv Detail & Related papers (2024-02-15T16:46:16Z) - MIMIC: Mask Image Pre-training with Mix Contrastive Fine-tuning for
Facial Expression Recognition [11.820043444385432]
We introduce a novel FER training paradigm named Mask Image pre-training with MIx Contrastive fine-tuning (MIMIC)
In the initial phase, we pre-train the ViT via masked image reconstruction on general images.
In the fine-tuning stage, we introduce a mix-supervised contrastive learning process, which enhances the model with a more extensive range of positive samples.
arXiv Detail & Related papers (2024-01-14T10:30:32Z) - PROMPT-IML: Image Manipulation Localization with Pre-trained Foundation
Models Through Prompt Tuning [35.39822183728463]
We present a novel Prompt-IML framework for detecting tampered images.
Humans tend to discern authenticity of an image based on semantic and high-frequency information.
Our model can achieve better performance on eight typical fake image datasets.
arXiv Detail & Related papers (2024-01-01T03:45:07Z) - Self-Supervised Neuron Segmentation with Multi-Agent Reinforcement
Learning [53.00683059396803]
Mask image model (MIM) has been widely used due to its simplicity and effectiveness in recovering original information from masked images.
We propose a decision-based MIM that utilizes reinforcement learning (RL) to automatically search for optimal image masking ratio and masking strategy.
Our approach has a significant advantage over alternative self-supervised methods on the task of neuron segmentation.
arXiv Detail & Related papers (2023-10-06T10:40:46Z) - MIS-FM: 3D Medical Image Segmentation using Foundation Models Pretrained
on a Large-Scale Unannotated Dataset [14.823114726604853]
We propose a novel self-supervised learning strategy named Volume Fusion (VF) for pretraining 3D segmentation models.
VF forces the model to predict the fusion coefficient of each voxel, which is formulated as a self-supervised segmentation task without manual annotations.
experiments with different downstream segmentation targets including head and neck organs, thoracic/abdominal organs showed that our pretrained model largely outperformed training from scratch.
arXiv Detail & Related papers (2023-06-29T13:22:13Z) - HybridMIM: A Hybrid Masked Image Modeling Framework for 3D Medical Image
Segmentation [29.15746532186427]
HybridMIM is a novel hybrid self-supervised learning method based on masked image modeling for 3D medical image segmentation.
We learn the semantic information of medical images at three levels, including:1) partial region prediction to reconstruct key contents of the 3D image, which largely reduces the pre-training time burden.
The proposed framework is versatile to support both CNN and transformer as encoder backbones, and also enables to pre-train decoders for image segmentation.
arXiv Detail & Related papers (2023-03-18T04:43:12Z) - Masked Frequency Modeling for Self-Supervised Visual Pre-Training [102.89756957704138]
We present Masked Frequency Modeling (MFM), a unified frequency-domain-based approach for self-supervised pre-training of visual models.
MFM first masks out a portion of frequency components of the input image and then predicts the missing frequencies on the frequency spectrum.
For the first time, MFM demonstrates that, for both ViT and CNN, a simple non-Siamese framework can learn meaningful representations even using none of the following: (i) extra data, (ii) extra model, (iii) mask token.
arXiv Detail & Related papers (2022-06-15T17:58:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.