Related papers: Focus on Texture: Rethinking Pre-training in Masked Autoencoders for Medical Image Classification

Focus on Texture: Rethinking Pre-training in Masked Autoencoders for Medical Image Classification

URL: http://arxiv.org/abs/2507.10869v1
Date: Tue, 15 Jul 2025 00:12:26 GMT
Title: Focus on Texture: Rethinking Pre-training in Masked Autoencoders for Medical Image Classification
Authors: Chetan Madan, Aarjav Satia, Soumen Basu, Pankaj Gupta, Usha Dutta, Chetan Arora,
Abstract summary: Masked Autoencoders (MAEs) have emerged as a dominant strategy for self-supervised representation learning in natural images.<n>We propose a novel MAE based pre-training framework, GLCM-MAE, using reconstruction loss based on matching GLCM.<n>GLCM-MAE outperforms the current state-of-the-art across four tasks.
Score: 6.641920678512381
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Masked Autoencoders (MAEs) have emerged as a dominant strategy for self-supervised representation learning in natural images, where models are pre-trained to reconstruct masked patches with a pixel-wise mean squared error (MSE) between original and reconstructed RGB values as the loss. We observe that MSE encourages blurred image re-construction, but still works for natural images as it preserves dominant edges. However, in medical imaging, when the texture cues are more important for classification of a visual abnormality, the strategy fails. Taking inspiration from Gray Level Co-occurrence Matrix (GLCM) feature in Radiomics studies, we propose a novel MAE based pre-training framework, GLCM-MAE, using reconstruction loss based on matching GLCM. GLCM captures intensity and spatial relationships in an image, hence proposed loss helps preserve morphological features. Further, we propose a novel formulation to convert matching GLCM matrices into a differentiable loss function. We demonstrate that unsupervised pre-training on medical images with the proposed GLCM loss improves representations for downstream tasks. GLCM-MAE outperforms the current state-of-the-art across four tasks - gallbladder cancer detection from ultrasound images by 2.1%, breast cancer detection from ultrasound by 3.1%, pneumonia detection from x-rays by 0.5%, and COVID detection from CT by 0.6%. Source code and pre-trained models are available at: https://github.com/ChetanMadan/GLCM-MAE.

Related papers

Generative AI: A Pix2pix-GAN-Based Machine Learning Approach for Robust and Efficient Lung Segmentation [0.7614628596146602]
This study develops a deep learning framework using a Pix2pix Generative Adversarial Network (GAN) to segment pulmonary abnormalities from CXR images.<n>The framework's image preprocessing and augmentation techniques were properly incorporated with a U-Net-inspired generator-discriminator architecture.
arXiv Detail & Related papers (2024-12-14T13:12:09Z)
Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images [68.42215385041114]
This paper introduces a novel lightweight multi-level adaptation and comparison framework to repurpose the CLIP model for medical anomaly detection. Our approach integrates multiple residual adapters into the pre-trained visual encoder, enabling a stepwise enhancement of visual features across different levels. Our experiments on medical anomaly detection benchmarks demonstrate that our method significantly surpasses current state-of-the-art models.
arXiv Detail & Related papers (2024-03-19T09:28:19Z)
Disruptive Autoencoders: Leveraging Low-level features for 3D Medical Image Pre-training [51.16994853817024]
This work focuses on designing an effective pre-training framework for 3D radiology images. We introduce Disruptive Autoencoders, a pre-training framework that attempts to reconstruct the original image from disruptions created by a combination of local masking and low-level perturbations. The proposed pre-training framework is tested across multiple downstream tasks and achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-07-31T17:59:42Z)
Cross-modulated Few-shot Image Generation for Colorectal Tissue Classification [58.147396879490124]
Our few-shot generation method, named XM-GAN, takes one base and a pair of reference tissue images as input and generates high-quality yet diverse images. To the best of our knowledge, we are the first to investigate few-shot generation in colorectal tissue images.
arXiv Detail & Related papers (2023-04-04T17:50:30Z)
One Sample Diffusion Model in Projection Domain for Low-Dose CT Imaging [10.797632196651731]
Low-dose computed tomography (CT) plays a significant role in reducing the radiation risk in clinical applications. With the rapid development and wide application of deep learning, it has brought new directions for the development of low-dose CT imaging algorithms. We propose a fully unsupervised one sample diffusion model (OSDM)in projection domain for low-dose CT reconstruction. The results prove that OSDM is practical and effective model for reducing the artifacts and preserving the image quality.
arXiv Detail & Related papers (2022-12-07T13:39:23Z)
Attentive Symmetric Autoencoder for Brain MRI Segmentation [56.02577247523737]
We propose a novel Attentive Symmetric Auto-encoder based on Vision Transformer (ViT) for 3D brain MRI segmentation tasks. In the pre-training stage, the proposed auto-encoder pays more attention to reconstruct the informative patches according to the gradient metrics. Experimental results show that our proposed attentive symmetric auto-encoder outperforms the state-of-the-art self-supervised learning methods and medical image segmentation models.
arXiv Detail & Related papers (2022-09-19T09:43:19Z)
Negligible effect of brain MRI data preprocessing for tumor segmentation [36.89606202543839]
We conduct experiments on three publicly available datasets and evaluate the effect of different preprocessing steps in deep neural networks. Our results demonstrate that most popular standardization steps add no value to the network performance. We suggest that image intensity normalization approaches do not contribute to model accuracy because of the reduction of signal variance with image standardization.
arXiv Detail & Related papers (2022-04-11T17:29:36Z)
Intelligent Masking: Deep Q-Learning for Context Encoding in Medical Image Analysis [48.02011627390706]
We develop a novel self-supervised approach that occludes targeted regions to improve the pre-training procedure. We show that training the agent against the prediction model can significantly improve the semantic features extracted for downstream classification tasks.
arXiv Detail & Related papers (2022-03-25T19:05:06Z)
Zero-Shot Domain Adaptation in CT Segmentation by Filtered Back Projection Augmentation [0.1197985185770095]
Domain shift is one of the most salient challenges in medical computer vision. We address variability in computed tomography (CT) images caused by different convolution kernels used in the reconstruction process. We propose Filtered Back-Projection Augmentation (FBPAug), a simple and surprisingly efficient approach to augment CT images in sinogram space emulating reconstruction with different kernels.
arXiv Detail & Related papers (2021-07-18T21:46:49Z)
Improved Slice-wise Tumour Detection in Brain MRIs by Computing Dissimilarities between Latent Representations [68.8204255655161]
Anomaly detection for Magnetic Resonance Images (MRIs) can be solved with unsupervised methods. We have proposed a slice-wise semi-supervised method for tumour detection based on the computation of a dissimilarity function in the latent space of a Variational AutoEncoder. We show that by training the models on higher resolution images and by improving the quality of the reconstructions, we obtain results which are comparable with different baselines.
arXiv Detail & Related papers (2020-07-24T14:02:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.