Improving Representation of High-frequency Components for Medical Visual Foundation Models
- URL: http://arxiv.org/abs/2407.14651v3
- Date: Mon, 03 Mar 2025 09:31:01 GMT
- Title: Improving Representation of High-frequency Components for Medical Visual Foundation Models
- Authors: Yuetan Chu, Yilan Zhang, Zhongyi Han, Changchun Yang, Longxi Zhou, Gongning Luo, Chao Huang, Xin Gao,
- Abstract summary: We propose a novel pretraining strategy, named Frequency-advanced Representation Autoencoder (Frepa)<n>Frepa encourages the encoder to effectively represent and preserve high-frequency components in the image embeddings.<n>We develop Frepa across nine medical modalities and validate it on 32 downstream tasks for both 2D images and 3D volume data.
- Score: 16.39492793639237
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Foundation models have recently attracted significant attention for their impressive generalizability across diverse downstream tasks. However, these models are demonstrated to exhibit great limitations in representing high-frequency components and fine-grained details. In many medical imaging tasks, the precise representation of such information is crucial due to the inherently intricate anatomical structures, sub-visual features, and complex boundaries involved. Consequently, the limited representation of prevalent foundation models can result in significant performance degradation or even failure in these tasks. To address these challenges, we propose a novel pretraining strategy, named Frequency-advanced Representation Autoencoder (Frepa). Through high-frequency masking and low-frequency perturbation combined with adversarial learning, Frepa encourages the encoder to effectively represent and preserve high-frequency components in the image embeddings. Additionally, we introduce an innovative histogram-equalized image masking strategy, extending the Masked Autoencoder approach beyond ViT to other architectures such as Swin Transformer and convolutional networks. We develop Frepa across nine medical modalities and validate it on 32 downstream tasks for both 2D images and 3D volume data. Without fine-tuning, Frepa can outperform other self-supervised pretraining methods and, in some cases, even surpasses task-specific trained models. This improvement is particularly significant for tasks involving fine-grained details, such as achieving up to a +15% increase in DSC for retina vessel segmentation and a +7% increase in IoU for lung nodule detection. Further experiments quantitatively reveal that Frepa enables superior high-frequency representations and preservation in the embeddings, underscoring its potential for developing more generalized and universal medical image foundation models.
Related papers
- Self-Bootstrapping for Versatile Test-Time Adaptation [29.616417768209114]
We develop a versatile test-time adaptation (TTA) objective for a variety of tasks.
We achieve this through a self-bootstrapping scheme that optimize prediction consistency between the test image (as target) and its deteriorated view.
Experiments show that, either independently or as a plug-and-play module, our method achieves superior results across classification, segmentation, and 3D monocular detection tasks.
arXiv Detail & Related papers (2025-04-10T05:45:07Z) - FE-UNet: Frequency Domain Enhanced U-Net with Segment Anything Capability for Versatile Image Segmentation [50.9040167152168]
We experimentally quantify the contrast sensitivity function of CNNs and compare it with that of the human visual system.
We propose the Wavelet-Guided Spectral Pooling Module (WSPM) to enhance and balance image features across the frequency domain.
To further emulate the human visual system, we introduce the Frequency Domain Enhanced Receptive Field Block (FE-RFB)
We develop FE-UNet, a model that utilizes SAM2 as its backbone and incorporates Hiera-Large as a pre-trained block.
arXiv Detail & Related papers (2025-02-06T07:24:34Z) - Physics-informed DeepCT: Sinogram Wavelet Decomposition Meets Masked Diffusion [9.126628956920904]
Diffusion model shows remarkable potential on sparse-view computed tomography (SVCT) reconstruction.
We propose a Sinogram-based Wavelet random decomposition And Random mask diffusion Model (SWARM) for SVCT reconstruction.
arXiv Detail & Related papers (2025-01-17T03:16:15Z) - FCDM: Sparse-view Sinogram Inpainting with Frequency Domain Convolution Enhanced Diffusion Models [14.043383277622874]
We introduce a novel diffusion-based inpainting framework tailored for sinogram data.
FCDM significantly outperforms existing methods, achieving SSIM over 0.95 and PSNR above 30 dB, with improvements of up to 33% in SSIM and 29% in PSNR compared to baselines.
arXiv Detail & Related papers (2024-08-26T12:31:38Z) - Discriminative Hamiltonian Variational Autoencoder for Accurate Tumor Segmentation in Data-Scarce Regimes [2.8498944632323755]
We propose an end-to-end hybrid architecture for medical image segmentation.
We use Hamiltonian Variational Autoencoders (HVAE) and a discriminative regularization to improve the quality of generated images.
Our architecture operates on a slice-by-slice basis to segment 3D volumes, capitilizing on the richly augmented dataset.
arXiv Detail & Related papers (2024-06-17T15:42:08Z) - NeuroPictor: Refining fMRI-to-Image Reconstruction via Multi-individual Pretraining and Multi-level Modulation [55.51412454263856]
This paper proposes to directly modulate the generation process of diffusion models using fMRI signals.
By training with about 67,000 fMRI-image pairs from various individuals, our model enjoys superior fMRI-to-image decoding capacity.
arXiv Detail & Related papers (2024-03-27T02:42:52Z) - Enhancing Weakly Supervised 3D Medical Image Segmentation through
Probabilistic-aware Learning [52.249748801637196]
3D medical image segmentation is a challenging task with crucial implications for disease diagnosis and treatment planning.
Recent advances in deep learning have significantly enhanced fully supervised medical image segmentation.
We propose a novel probabilistic-aware weakly supervised learning pipeline, specifically designed for 3D medical imaging.
arXiv Detail & Related papers (2024-03-05T00:46:53Z) - Disruptive Autoencoders: Leveraging Low-level features for 3D Medical
Image Pre-training [51.16994853817024]
This work focuses on designing an effective pre-training framework for 3D radiology images.
We introduce Disruptive Autoencoders, a pre-training framework that attempts to reconstruct the original image from disruptions created by a combination of local masking and low-level perturbations.
The proposed pre-training framework is tested across multiple downstream tasks and achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-07-31T17:59:42Z) - Preservation of High Frequency Content for Deep Learning-Based Medical
Image Classification [74.84221280249876]
An efficient analysis of large amounts of chest radiographs can aid physicians and radiologists.
We propose a novel Discrete Wavelet Transform (DWT)-based method for the efficient identification and encoding of visual information.
arXiv Detail & Related papers (2022-05-08T15:29:54Z) - Multimodal-Boost: Multimodal Medical Image Super-Resolution using
Multi-Attention Network with Wavelet Transform [5.416279158834623]
Loss of corresponding image resolution degrades the overall performance of medical image diagnosis.
Deep learning based single image super resolution (SISR) algorithms has revolutionized the overall diagnosis framework.
This work proposes generative adversarial network (GAN) with deep multi-attention modules to learn high-frequency information from low-frequency data.
arXiv Detail & Related papers (2021-10-22T10:13:46Z) - FREA-Unet: Frequency-aware U-net for Modality Transfer [9.084926957557842]
We propose a new frequency-aware attention U-net for generating synthetic PET images from MRI data.
Our attention Unet computes the attention scores for feature maps in low/high frequency layers and use it to help the model focus more on the most important regions.
arXiv Detail & Related papers (2020-12-31T01:58:44Z) - Hierarchical Amortized Training for Memory-efficient High Resolution 3D
GAN [52.851990439671475]
We propose a novel end-to-end GAN architecture that can generate high-resolution 3D images.
We achieve this goal by using different configurations between training and inference.
Experiments on 3D thorax CT and brain MRI demonstrate that our approach outperforms state of the art in image generation.
arXiv Detail & Related papers (2020-08-05T02:33:04Z) - Inf-Net: Automatic COVID-19 Lung Infection Segmentation from CT Images [152.34988415258988]
Automated detection of lung infections from computed tomography (CT) images offers a great potential to augment the traditional healthcare strategy for tackling COVID-19.
segmenting infected regions from CT slices faces several challenges, including high variation in infection characteristics, and low intensity contrast between infections and normal tissues.
To address these challenges, a novel COVID-19 Deep Lung Infection Network (Inf-Net) is proposed to automatically identify infected regions from chest CT slices.
arXiv Detail & Related papers (2020-04-22T07:30:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.