Related papers: Triad: Vision Foundation Model for 3D Magnetic Resonance Imaging

Triad: Vision Foundation Model for 3D Magnetic Resonance Imaging

URL: http://arxiv.org/abs/2502.14064v2
Date: Sun, 23 Feb 2025 03:13:01 GMT
Title: Triad: Vision Foundation Model for 3D Magnetic Resonance Imaging
Authors: Shansong Wang, Mojtaba Safari, Qiang Li, Chih-Wei Chang, Richard LJ Qiu, Justin Roper, David S. Yu, Xiaofeng Yang,
Abstract summary: We propose Triad, a vision foundation model for 3D MRI.<n> Triad adopts a widely used autoencoder architecture to learn robust representations from 131,170 3D MRI volumes.<n>We evaluate Triad across three tasks, namely, organ/tumor segmentation, organ/cancer classification, and medical image registration.
Score: 3.7942449131350413
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Vision foundation models (VFMs) are pre-trained on extensive image datasets to learn general representations for diverse types of data. These models can subsequently be fine-tuned for specific downstream tasks, significantly boosting performance across a broad range of applications. However, existing vision foundation models that claim to be applicable to various clinical tasks are mostly pre-trained on 3D computed tomography (CT), which benefits from the availability of extensive 3D CT databases. Significant differences between CT and magnetic resonance imaging (MRI) in imaging principles, signal characteristics, and data distribution may hinder their practical performance and versatility in MRI-specific applications. Here, we propose Triad, a vision foundation model for 3D MRI. Triad adopts a widely used autoencoder architecture to learn robust representations from 131,170 3D MRI volumes and uses organ-independent imaging descriptions to constrain the semantic distribution of the visual modality. The above pre-training dataset is called Triad-131K, which is currently the largest 3D MRI pre-training dataset. We evaluate Triad across three tasks, namely, organ/tumor segmentation, organ/cancer classification, and medical image registration, in two data modalities (within-domain and out-of-domain) settings using 25 downstream datasets. By initializing models with Triad's pre-trained weights, nnUNet-Triad improves segmentation performance by 2.51% compared to nnUNet-Scratch across 17 datasets. Swin-B-Triad achieves a 3.97% improvement over Swin-B-Scratch in classification tasks across five datasets. SwinUNETR-Triad improves by 4.00% compared to SwinUNETR-Scratch in registration tasks across two datasets. Our study demonstrates that pre-training can improve performance when the data modalities and organs of upstream and downstream tasks are consistent.

Related papers

Towards Scalable Language-Image Pre-training for 3D Medical Imaging [49.18894445671976]
We introduce Hierarchical attention for Language-Image Pre-training (HLIP), a scalable pre-training framework for 3D medical imaging.<n>HLIP adopts a lightweight hierarchical attention mechanism inspired by the natural hierarchy of radiology data: slice, scan, and study.<n>Trained on 220K patients with 3.13 million scans for brain MRI and 240K patients with 1.44 million scans for head CT, HLIP achieves state-of-the-art performance.
arXiv Detail & Related papers (2025-05-28T01:16:34Z)
MRI Image Generation Based on Text Prompts [0.0]
This study explores the use of text-prompted MRI image generation with the Stable Diffusion (SD) model to address challenges in acquiring real MRI datasets.<n>The SD model, pre-trained on natural images, was fine-tuned using the 3T fastMRI dataset and the 0.3T M4Raw dataset.<n>The performance of the fine-tuned model was evaluated using quantitative metrics, including Fr'echet Inception Distance (FID) and Multi-Scale Structural Similarity (MS-SSIM)
arXiv Detail & Related papers (2025-05-23T03:01:22Z)
3D-CT-GPT: Generating 3D Radiology Reports through Integration of Large Vision-Language Models [51.855377054763345]
This paper introduces 3D-CT-GPT, a Visual Question Answering (VQA)-based medical visual language model for generating radiology reports from 3D CT scans. Experiments on both public and private datasets demonstrate that 3D-CT-GPT significantly outperforms existing methods in terms of report accuracy and quality.
arXiv Detail & Related papers (2024-09-28T12:31:07Z)
MinD-3D++: Advancing fMRI-Based 3D Reconstruction with High-Quality Textured Mesh Generation and a Comprehensive Dataset [50.534007259536715]
Reconstructing 3D visuals from functional Magnetic Resonance Imaging (fMRI) data is of significant interest to cognitive neuroscience and computer vision.<n>We present the fMRI-3D dataset, which includes data from 15 participants and showcases a total of 4,768 3D objects.<n>We propose MinD-3D++, a novel framework for decoding textured 3D visual information from fMRI signals.
arXiv Detail & Related papers (2024-09-17T16:13:59Z)
Evaluating Pre-trained Convolutional Neural Networks and Foundation Models as Feature Extractors for Content-based Medical Image Retrieval [0.37478492878307323]
Content-based medical image retrieval (CBMIR) depends on image features, which can be extracted automatically or semi-automatically. In this study, we used several pre-trained feature extractors from well-known pre-trained convolutional neural networks (CNNs) and pre-trained foundation models. Our results show that, overall, for the 2D datasets, foundation models deliver superior performance by a large margin compared to CNNs. Our findings confirm that while using larger image sizes (especially for 2D datasets) yields slightly better performance, competitive CBMIR performance can still be achieved even with smaller image
arXiv Detail & Related papers (2024-09-14T13:07:30Z)
2D and 3D Deep Learning Models for MRI-based Parkinson's Disease Classification: A Comparative Analysis of Convolutional Kolmogorov-Arnold Networks, Convolutional Neural Networks, and Graph Convolutional Networks [0.0]
This study applies Convolutional Kolmogorov-Arnold Networks (ConvKANs) to Parkinson's Disease diagnosis. ConvKANs integrate learnable activation functions into convolutional layers, for PD classification using structural MRI. The first 3D implementation of ConvKANs for medical imaging is presented, comparing their performance to Convolutional Neural Networks (CNNs) and Graph Convolutional Networks (GCNs) These findings highlight ConvKANs' potential for PD detection, emphasize the importance of 3D analysis in capturing subtle brain changes, and underscore cross-dataset generalization challenges.
arXiv Detail & Related papers (2024-07-24T16:04:18Z)
Diff3Dformer: Leveraging Slice Sequence Diffusion for Enhanced 3D CT Classification with Transformer Networks [5.806035963947936]
We propose a Diffusion-based 3D Vision Transformer (Diff3Dformer) to aggregate repetitive information within 3D CT scans. Our method exhibits improved performance on two different scales of small datasets of 3D lung CT scans.
arXiv Detail & Related papers (2024-06-24T23:23:18Z)
Argus: Benchmarking and Enhancing Vision-Language Models for 3D Radiology Report Generation [15.897686345011731]
There has been no comprehensive benchmark for 3D radiograph report generation (3DRRG) We curate **CT-3DRRG**, the largest **publicly** available 3D CT-report dataset, establishing a robust and diverse benchmark for evaluating VLM performance on 3DRRG. We propose a comprehensive training recipe for building high-performing VLMs for 3DRRG, exploring key factors such as vision encoder pretraining strategies, visual token compression, and the impact of data & model scale.
arXiv Detail & Related papers (2024-06-11T10:45:59Z)
SDR-Former: A Siamese Dual-Resolution Transformer for Liver Lesion Classification Using 3D Multi-Phase Imaging [59.78761085714715]
This study proposes a novel Siamese Dual-Resolution Transformer (SDR-Former) framework for liver lesion classification. The proposed framework has been validated through comprehensive experiments on two clinical datasets. To support the scientific community, we are releasing our extensive multi-phase MR dataset for liver lesion analysis to the public.
arXiv Detail & Related papers (2024-02-27T06:32:56Z)
3DSAM-adapter: Holistic adaptation of SAM from 2D to 3D for promptable tumor segmentation [52.699139151447945]
We propose a novel adaptation method for transferring the segment anything model (SAM) from 2D to 3D for promptable medical image segmentation. Our model can outperform domain state-of-the-art medical image segmentation models on 3 out of 4 tasks, specifically by 8.25%, 29.87%, and 10.11% for kidney tumor, pancreas tumor, colon cancer segmentation, and achieve similar performance for liver tumor segmentation.
arXiv Detail & Related papers (2023-06-23T12:09:52Z)
LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets. We have collected approximately 1.3 million medical images from 55 publicly available datasets. LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z)
Segmenting white matter hyperintensities on isotropic three-dimensional Fluid Attenuated Inversion Recovery magnetic resonance images: Assessing deep learning tools on norwegian imaging database [0.0]
White matter hyperintensities (WMHs) are a hallmark of cerebral small vessel disease and Alzheimer's disease (AD) Current study details the deployment of deep learning tools to enable automated WMH segmentation and characterization from 3D FLAIR-weighted images.
arXiv Detail & Related papers (2022-07-18T09:36:44Z)
Advancing 3D Medical Image Analysis with Variable Dimension Transform based Supervised 3D Pre-training [45.90045513731704]
This paper revisits an innovative yet simple fully-supervised 3D network pre-training framework. With a redesigned 3D network architecture, reformulated natural images are used to address the problem of data scarcity. Comprehensive experiments on four benchmark datasets demonstrate that the proposed pre-trained models can effectively accelerate convergence.
arXiv Detail & Related papers (2022-01-05T03:11:21Z)
Automated Model Design and Benchmarking of 3D Deep Learning Models for COVID-19 Detection with Chest CT Scans [72.04652116817238]
We propose a differentiable neural architecture search (DNAS) framework to automatically search for the 3D DL models for 3D chest CT scans classification. We also exploit the Class Activation Mapping (CAM) technique on our models to provide the interpretability of the results.
arXiv Detail & Related papers (2021-01-14T03:45:01Z)
Fader Networks for domain adaptation on fMRI: ABIDE-II study [68.5481471934606]
We use 3D convolutional autoencoders to build the domain irrelevant latent space image representation and demonstrate this method to outperform existing approaches on ABIDE data.
arXiv Detail & Related papers (2020-10-14T16:50:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.