MedDChest: A Content-Aware Multimodal Foundational Vision Model for Thoracic Imaging
- URL: http://arxiv.org/abs/2511.04016v1
- Date: Thu, 06 Nov 2025 03:28:56 GMT
- Title: MedDChest: A Content-Aware Multimodal Foundational Vision Model for Thoracic Imaging
- Authors: Mahmoud Soliman, Islam Osman, Mohamed S. Shehata, Rasika Rajapakshe,
- Abstract summary: We propose MedDChest, a new foundational Vision Transformer (ViT) model optimized specifically for thoracic imaging.<n>We pre-trained MedDChest from scratch on a massive, curated, multimodal dataset of over 1.2 million images.<n>We validate our model's effectiveness by fine-tuning it on a diverse set of downstream diagnostic tasks.
- Score: 3.0332210076508326
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The performance of vision models in medical imaging is often hindered by the prevailing paradigm of fine-tuning backbones pre-trained on out-of-domain natural images. To address this fundamental domain gap, we propose MedDChest, a new foundational Vision Transformer (ViT) model optimized specifically for thoracic imaging. We pre-trained MedDChest from scratch on a massive, curated, multimodal dataset of over 1.2 million images, encompassing different modalities including Chest X-ray and Computed Tomography (CT) compiled from 10 public sources. A core technical contribution of our work is Guided Random Resized Crops, a novel content-aware data augmentation strategy that biases sampling towards anatomically relevant regions, overcoming the inefficiency of standard cropping techniques on medical scans. We validate our model's effectiveness by fine-tuning it on a diverse set of downstream diagnostic tasks. Comprehensive experiments empirically demonstrate that MedDChest significantly outperforms strong, publicly available ImageNet-pretrained models. By establishing the superiority of large-scale, in-domain pre-training combined with domain-specific data augmentation, MedDChest provides a powerful and robust feature extractor that serves as a significantly better starting point for a wide array of thoracic diagnostic tasks. The model weights will be made publicly available to foster future research and applications.
Related papers
- MedVAR: Towards Scalable and Efficient Medical Image Generation via Next-scale Autoregressive Prediction [27.388534881299393]
A scalable generative backbone for medical imaging requires architectural efficiency, sufficient multi-organ data, and principled evaluation.<n>We introduce Med VAR, the first autoregressive-based foundation model that adopts the next-scale prediction paradigm to enable fast and scale-up-friendly medical image synthesis.<n>To support hierarchical generation, we curate a harmonized dataset of around 440,000 CT and MRI images spanning six anatomical regions.
arXiv Detail & Related papers (2026-02-16T06:48:48Z) - MM-DINOv2: Adapting Foundation Models for Multi-Modal Medical Image Analysis [19.063517827476826]
We introduce MM-DINOv2, a novel framework that adapts the pre-trained vision foundation model DINOv2 for multi-modal medical imaging.<n>Our approach incorporates multi-modal patch embeddings, enabling vision foundation models to effectively process multi-modal imaging data.<n>Our method achieves a Matthews Correlation Coefficient (MCC) of 0.6 on an external test set, surpassing state-of-the-art supervised approaches by +11.1%.
arXiv Detail & Related papers (2025-09-08T12:34:15Z) - Does DINOv3 Set a New Medical Vision Standard? [67.33543059306938]
This report investigates whether DINOv3 can serve as a powerful unified encoder for medical vision tasks without domain-specific pre-training.<n>We benchmark DINOv3 across common medical vision tasks, including 2D/3D classification and segmentation.<n>Remarkably, it can even outperform medical-specific foundation models like BiomedCLIP and CT-Net on several tasks.
arXiv Detail & Related papers (2025-09-08T09:28:57Z) - CC-DCNet: Dynamic Convolutional Neural Network with Contrastive Constraints for Identifying Lung Cancer Subtypes on Multi-modality Images [13.655407979403945]
We propose a novel deep learning network designed to accurately classify lung cancer subtype with multi-dimensional and multi-modality images.
The strength of the proposed model lies in its ability to dynamically process both paired CT-pathological image sets and independent CT image sets.
We also develop a contrastive constraint module, which quantitatively maps the cross-modality associations through network training.
arXiv Detail & Related papers (2024-07-18T01:42:00Z) - Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images [68.42215385041114]
This paper introduces a novel lightweight multi-level adaptation and comparison framework to repurpose the CLIP model for medical anomaly detection.
Our approach integrates multiple residual adapters into the pre-trained visual encoder, enabling a stepwise enhancement of visual features across different levels.
Our experiments on medical anomaly detection benchmarks demonstrate that our method significantly surpasses current state-of-the-art models.
arXiv Detail & Related papers (2024-03-19T09:28:19Z) - Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation [113.5002649181103]
Training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology.
For training, we assemble a large dataset of over 697 thousand radiology image-text pairs.
For evaluation, we propose CheXprompt, a GPT-4-based metric for factuality evaluation, and demonstrate its parity with expert evaluation.
The inference of LlaVA-Rad is fast and can be performed on a single V100 GPU in private settings, offering a promising state-of-the-art tool for real-world clinical applications.
arXiv Detail & Related papers (2024-03-12T18:12:02Z) - DiCoM -- Diverse Concept Modeling towards Enhancing Generalizability in Chest X-Ray Studies [6.83819481805979]
Chest X-Ray (CXR) is a widely used clinical imaging modality.
Self-supervised pre-training has proven to outperform supervised pre-training in numerous downstream vision tasks.
We introduce Diverse Concept Modeling (DiCoM), a novel self-supervised training paradigm.
arXiv Detail & Related papers (2024-02-22T20:51:37Z) - From CNN to Transformer: A Review of Medical Image Segmentation Models [7.3150850275578145]
Deep learning for medical image segmentation has become a prevalent trend.
In this paper, we conduct a survey of the most representative four medical image segmentation models in recent years.
We theoretically analyze the characteristics of these models and quantitatively evaluate their performance on two benchmark datasets.
arXiv Detail & Related papers (2023-08-10T02:48:57Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z) - Domain Generalization for Mammographic Image Analysis with Contrastive
Learning [62.25104935889111]
The training of an efficacious deep learning model requires large data with diverse styles and qualities.
A novel contrastive learning is developed to equip the deep learning models with better style generalization capability.
The proposed method has been evaluated extensively and rigorously with mammograms from various vendor style domains and several public datasets.
arXiv Detail & Related papers (2023-04-20T11:40:21Z) - MedSegDiff-V2: Diffusion based Medical Image Segmentation with
Transformer [53.575573940055335]
We propose a novel Transformer-based Diffusion framework, called MedSegDiff-V2.
We verify its effectiveness on 20 medical image segmentation tasks with different image modalities.
arXiv Detail & Related papers (2023-01-19T03:42:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.