Related papers: OCTCube: A 3D foundation model for optical coherence tomography that improves cross-dataset, cross-disease, cross-device and cross-modality analysis

OCTCube: A 3D foundation model for optical coherence tomography that improves cross-dataset, cross-disease, cross-device and cross-modality analysis

URL: http://arxiv.org/abs/2408.11227v1
Date: Tue, 20 Aug 2024 22:55:19 GMT
Title: OCTCube: A 3D foundation model for optical coherence tomography that improves cross-dataset, cross-disease, cross-device and cross-modality analysis
Authors: Zixuan Liu, Hanwen Xu, Addie Woicik, Linda G. Shapiro, Marian Blazes, Yue Wu, Cecilia S. Lee, Aaron Y. Lee, Sheng Wang,
Abstract summary: OCTCube is a 3D foundation model pre-trained on 26,605 3D OCT volumes encompassing 1.62 million 2D OCT images. It outperforms 2D models when predicting 8 retinal diseases in both inductive and cross-dataset settings. It also shows superior performance on cross-device prediction and when predicting systemic diseases, such as diabetes and hypertension.
Score: 11.346324975034051
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Optical coherence tomography (OCT) has become critical for diagnosing retinal diseases as it enables 3D images of the retina and optic nerve. OCT acquisition is fast, non-invasive, affordable, and scalable. Due to its broad applicability, massive numbers of OCT images have been accumulated in routine exams, making it possible to train large-scale foundation models that can generalize to various diagnostic tasks using OCT images. Nevertheless, existing foundation models for OCT only consider 2D image slices, overlooking the rich 3D structure. Here, we present OCTCube, a 3D foundation model pre-trained on 26,605 3D OCT volumes encompassing 1.62 million 2D OCT images. OCTCube is developed based on 3D masked autoencoders and exploits FlashAttention to reduce the larger GPU memory usage caused by modeling 3D volumes. OCTCube outperforms 2D models when predicting 8 retinal diseases in both inductive and cross-dataset settings, indicating that utilizing the 3D structure in the model instead of 2D data results in significant improvement. OCTCube further shows superior performance on cross-device prediction and when predicting systemic diseases, such as diabetes and hypertension, further demonstrating its strong generalizability. Finally, we propose a contrastive-self-supervised-learning-based OCT-IR pre-training framework (COIP) for cross-modality analysis on OCT and infrared retinal (IR) images, where the OCT volumes are embedded using OCTCube. We demonstrate that COIP enables accurate alignment between OCT and IR en face images. Collectively, OCTCube, a 3D OCT foundation model, demonstrates significantly better performance against 2D models on 27 out of 29 tasks and comparable performance on the other two tasks, paving the way for AI-based retinal disease diagnosis.

Related papers

TSUBF-Net: Trans-Spatial UNet-like Network with Bi-direction Fusion for Segmentation of Adenoid Hypertrophy in CT [15.491823587869087]
Adenoid hypertrophy is a common cause of obstructive sleep apnea-hypopnea syndrome in children. We introduce TSUBF-Net, a 3D medical image segmentation framework. TSUBF-Net is superior to the state-of-the-art methods with the lowest HD95: 7.03, IoU:85.63, and DSC: 92.26 on our own dataset.
arXiv Detail & Related papers (2024-12-01T12:21:23Z)
Abnormality-Driven Representation Learning for Radiology Imaging [0.8321462983924758]
We introduce lesion-enhanced contrastive learning (LeCL), a novel approach to obtain visual representations driven by abnormalities in 2D axial slices across different locations of the CT scans. We evaluate our approach across three clinical tasks: tumor lesion location, lung disease detection, and patient staging, benchmarking against four state-of-the-art foundation models.
arXiv Detail & Related papers (2024-11-25T13:53:26Z)
Deep Convolutional Neural Networks on Multiclass Classification of Three-Dimensional Brain Images for Parkinson's Disease Stage Prediction [2.931680194227131]
We developed a model capable of accurately predicting Parkinson's disease stages. We used the entire three-dimensional (3D) brain images as input. We incorporated an attention mechanism to account for the varying importance of different slices in the prediction process.
arXiv Detail & Related papers (2024-10-31T05:40:08Z)
3D-CT-GPT: Generating 3D Radiology Reports through Integration of Large Vision-Language Models [51.855377054763345]
This paper introduces 3D-CT-GPT, a Visual Question Answering (VQA)-based medical visual language model for generating radiology reports from 3D CT scans. Experiments on both public and private datasets demonstrate that 3D-CT-GPT significantly outperforms existing methods in terms of report accuracy and quality.
arXiv Detail & Related papers (2024-09-28T12:31:07Z)
The Quest for Early Detection of Retinal Disease: 3D CycleGAN-based Translation of Optical Coherence Tomography into Confocal Microscopy [11.321411104729002]
We propose a novel framework based on unsupervised 3D CycleGAN for translating unpaired in vivo OCT to ex vivo confocal microscopy images. This marks the first attempt to exploit the inherent 3D information of OCT and translate it into the rich, detailed color domain of confocal microscopy.
arXiv Detail & Related papers (2024-08-07T21:13:49Z)
CT-GLIP: 3D Grounded Language-Image Pretraining with CT Scans and Radiology Reports for Full-Body Scenarios [53.94122089629544]
We introduce CT-GLIP (Grounded Language-Image Pretraining with CT scans), a novel method that constructs organ-level image-text pairs to enhance multimodal contrastive learning. Our method, trained on a multimodal CT dataset comprising 44,011 organ-level vision-text pairs from 17,702 patients across 104 organs, demonstrates it can identify organs and abnormalities in a zero-shot manner using natural languages.
arXiv Detail & Related papers (2024-04-23T17:59:01Z)
Simultaneous Alignment and Surface Regression Using Hybrid 2D-3D Networks for 3D Coherent Layer Segmentation of Retinal OCT Images with Full and Sparse Annotations [32.69359482975795]
This work presents a novel framework based on hybrid 2D-3D convolutional neural networks (CNNs) to obtain continuous 3D retinal layer surfaces from OCT volumes. Experiments on a synthetic dataset and three public clinical datasets show that our framework can effectively align the B-scans for potential motion correction.
arXiv Detail & Related papers (2023-12-04T08:32:31Z)
LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets. We have collected approximately 1.3 million medical images from 55 publicly available datasets. LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z)
Deep learning network to correct axial and coronal eye motion in 3D OCT retinal imaging [65.47834983591957]
We propose deep learning based neural networks to correct axial and coronal motion artifacts in OCT based on a single scan. The experimental result shows that the proposed method can effectively correct motion artifacts and achieve smaller error than other methods.
arXiv Detail & Related papers (2023-05-27T03:55:19Z)
Simultaneous Alignment and Surface Regression Using Hybrid 2D-3D Networks for 3D Coherent Layer Segmentation of Retina OCT Images [33.99874168018807]
In this study, a novel framework based on hybrid 2D-3D convolutional neural networks (CNNs) is proposed to obtain continuous 3D retinal layer surfaces from OCT. Our framework achieves superior results to state-of-the-art 2D methods in terms of both layer segmentation accuracy and cross-B-scan 3D continuity.
arXiv Detail & Related papers (2022-03-04T15:55:09Z)
Incremental Cross-view Mutual Distillation for Self-supervised Medical CT Synthesis [88.39466012709205]
This paper builds a novel medical slice to increase the between-slice resolution. Considering that the ground-truth intermediate medical slices are always absent in clinical practice, we introduce the incremental cross-view mutual distillation strategy. Our method outperforms state-of-the-art algorithms by clear margins.
arXiv Detail & Related papers (2021-12-20T03:38:37Z)
Multi-Scale Convolutional Neural Network for Automated AMD Classification using Retinal OCT Images [1.299941371793082]
Age-related macular degeneration (AMD) is the most common cause of blindness in developed countries, especially in people over 60 years of age. Recent developments in deep learning have provided a unique opportunity for the development of fully automated diagnosis frameworks. We propose a multi-scale convolutional neural network (CNN) capable of distinguishing pathologies using receptive fields with various sizes.
arXiv Detail & Related papers (2021-10-06T18:20:58Z)
Automated Model Design and Benchmarking of 3D Deep Learning Models for COVID-19 Detection with Chest CT Scans [72.04652116817238]
We propose a differentiable neural architecture search (DNAS) framework to automatically search for the 3D DL models for 3D chest CT scans classification. We also exploit the Class Activation Mapping (CAM) technique on our models to provide the interpretability of the results.
arXiv Detail & Related papers (2021-01-14T03:45:01Z)
Revisiting 3D Context Modeling with Supervised Pre-training for Universal Lesion Detection in CT Slices [48.85784310158493]
We propose a Modified Pseudo-3D Feature Pyramid Network (MP3D FPN) to efficiently extract 3D context enhanced 2D features for universal lesion detection in CT slices. With the novel pre-training method, the proposed MP3D FPN achieves state-of-the-art detection performance on the DeepLesion dataset. The proposed 3D pre-trained weights can potentially be used to boost the performance of other 3D medical image analysis tasks.
arXiv Detail & Related papers (2020-12-16T07:11:16Z)
Modelling the Distribution of 3D Brain MRI using a 2D Slice VAE [66.63629641650572]
We propose a method to model 3D MR brain volumes distribution by combining a 2D slice VAE with a Gaussian model that captures the relationships between slices. We also introduce a novel evaluation method for generated volumes that quantifies how well their segmentations match those of true brain anatomy.
arXiv Detail & Related papers (2020-07-09T13:23:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.