Leveraging Multimodal Fusion for Enhanced Diagnosis of Multiple Retinal
Diseases in Ultra-wide OCTA
- URL: http://arxiv.org/abs/2311.10331v1
- Date: Fri, 17 Nov 2023 05:23:57 GMT
- Title: Leveraging Multimodal Fusion for Enhanced Diagnosis of Multiple Retinal
Diseases in Ultra-wide OCTA
- Authors: Hao Wei, Peilun Shi, Guitao Bai, Minqing Zhang, Shuangle Li and Wu
Yuan
- Abstract summary: We have curated the pioneering M3 OCTA dataset, which is the first multimodal, multi-disease, and widest field-of-view UW- OCTA dataset.
We propose the first cross-modal fusion framework that leverages multi-modal information for diagnosing multiple diseases.
The construction of the M3 OCTA dataset aims to advance research in the ophthalmic image analysis community.
- Score: 4.741967726600469
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Ultra-wide optical coherence tomography angiography (UW-OCTA) is an emerging
imaging technique that offers significant advantages over traditional OCTA by
providing an exceptionally wide scanning range of up to 24 x 20 $mm^{2}$,
covering both the anterior and posterior regions of the retina. However, the
currently accessible UW-OCTA datasets suffer from limited comprehensive
hierarchical information and corresponding disease annotations. To address this
limitation, we have curated the pioneering M3OCTA dataset, which is the first
multimodal (i.e., multilayer), multi-disease, and widest field-of-view UW-OCTA
dataset. Furthermore, the effective utilization of multi-layer ultra-wide
ocular vasculature information from UW-OCTA remains underdeveloped. To tackle
this challenge, we propose the first cross-modal fusion framework that
leverages multi-modal information for diagnosing multiple diseases. Through
extensive experiments conducted on our openly available M3OCTA dataset, we
demonstrate the effectiveness and superior performance of our method, both in
fixed and varying modalities settings. The construction of the M3OCTA dataset,
the first multimodal OCTA dataset encompassing multiple diseases, aims to
advance research in the ophthalmic image analysis community.
Related papers
- Multi-View and Multi-Scale Alignment for Contrastive Language-Image
Pre-training in Mammography [4.500815515502233]
Contrastive Language-Image Pre-training shows promise in medical image analysis but requires substantial data and computational resources.
Here, we propose the first adaptation of the full CLIP model to mammography.
arXiv Detail & Related papers (2024-09-26T17:56:59Z) - ViKL: A Mammography Interpretation Framework via Multimodal Aggregation of Visual-knowledge-linguistic Features [54.37042005469384]
We announce MVKL, the first multimodal mammography dataset encompassing multi-view images, detailed manifestations and reports.
Based on this dataset, we focus on the challanging task of unsupervised pretraining.
We propose ViKL, a framework that synergizes Visual, Knowledge, and Linguistic features.
arXiv Detail & Related papers (2024-09-24T05:01:23Z) - Multi-OCT-SelfNet: Integrating Self-Supervised Learning with Multi-Source Data Fusion for Enhanced Multi-Class Retinal Disease Classification [2.5091334993691206]
Development of a robust deep-learning model for retinal disease diagnosis requires a substantial dataset for training.
The capacity to generalize effectively on smaller datasets remains a persistent challenge.
We've combined a wide range of data sources to improve performance and generalization to new data.
arXiv Detail & Related papers (2024-09-17T17:22:35Z) - MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine [53.01393667775077]
This paper introduces MedTrinity-25M, a comprehensive, large-scale multimodal dataset for medicine.
It covers over 25 million images across 10 modalities, with multigranular annotations for more than 65 diseases.
Unlike existing approach which is limited by the availability of image-text pairs, we have developed the first automated pipeline.
arXiv Detail & Related papers (2024-08-06T02:09:35Z) - SDR-Former: A Siamese Dual-Resolution Transformer for Liver Lesion
Classification Using 3D Multi-Phase Imaging [59.78761085714715]
This study proposes a novel Siamese Dual-Resolution Transformer (SDR-Former) framework for liver lesion classification.
The proposed framework has been validated through comprehensive experiments on two clinical datasets.
To support the scientific community, we are releasing our extensive multi-phase MR dataset for liver lesion analysis to the public.
arXiv Detail & Related papers (2024-02-27T06:32:56Z) - Three-Dimensional Medical Image Fusion with Deformable Cross-Attention [10.26573411162757]
Multimodal medical image fusion plays an instrumental role in several areas of medical image processing.
Traditional fusion methods tend to process each modality independently before combining the features and reconstructing the fusion image.
In this study, we introduce an innovative unsupervised feature mutual learning fusion network designed to rectify these limitations.
arXiv Detail & Related papers (2023-10-10T04:10:56Z) - C^2M-DoT: Cross-modal consistent multi-view medical report generation
with domain transfer network [67.97926983664676]
We propose a cross-modal consistent multi-view medical report generation with a domain transfer network (C2M-DoT)
C2M-DoT substantially outperforms state-of-the-art baselines in all metrics.
arXiv Detail & Related papers (2023-10-09T02:31:36Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z) - Multimodal Information Fusion for Glaucoma and DR Classification [1.5616442980374279]
Multimodal information is frequently available in medical tasks. By combining information from multiple sources, clinicians are able to make more accurate judgments.
Our paper investigates three multimodal information fusion strategies based on deep learning to solve retinal analysis tasks.
arXiv Detail & Related papers (2022-09-02T12:19:03Z) - Multi-Modal Multi-Instance Learning for Retinal Disease Recognition [10.294738095942812]
We aim to build a deep neural network that recognizes multiple vision-threatening diseases for the given case.
As both data acquisition and manual labeling are extremely expensive in the medical domain, the network has to be relatively lightweight.
arXiv Detail & Related papers (2021-09-25T08:16:47Z) - FetReg: Placental Vessel Segmentation and Registration in Fetoscopy
Challenge Dataset [57.30136148318641]
Fetoscopy laser photocoagulation is a widely used procedure for the treatment of Twin-to-Twin Transfusion Syndrome (TTTS)
This may lead to increased procedural time and incomplete ablation, resulting in persistent TTTS.
Computer-assisted intervention may help overcome these challenges by expanding the fetoscopic field of view through video mosaicking and providing better visualization of the vessel network.
We present a large-scale multi-centre dataset for the development of generalized and robust semantic segmentation and video mosaicking algorithms for the fetal environment with a focus on creating drift-free mosaics from long duration fetoscopy videos.
arXiv Detail & Related papers (2021-06-10T17:14:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.