Multi-Modal Multi-Instance Learning for Retinal Disease Recognition
- URL: http://arxiv.org/abs/2109.12307v1
- Date: Sat, 25 Sep 2021 08:16:47 GMT
- Title: Multi-Modal Multi-Instance Learning for Retinal Disease Recognition
- Authors: Xirong Li and Yang Zhou and Jie Wang and Hailan Lin and Jianchun Zhao
and Dayong Ding and Weihong Yu and Youxin Chen
- Abstract summary: We aim to build a deep neural network that recognizes multiple vision-threatening diseases for the given case.
As both data acquisition and manual labeling are extremely expensive in the medical domain, the network has to be relatively lightweight.
- Score: 10.294738095942812
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper attacks an emerging challenge of multi-modal retinal disease
recognition. Given a multi-modal case consisting of a color fundus photo (CFP)
and an array of OCT B-scan images acquired during an eye examination, we aim to
build a deep neural network that recognizes multiple vision-threatening
diseases for the given case. As the diagnostic efficacy of CFP and OCT is
disease-dependent, the network's ability of being both selective and
interpretable is important. Moreover, as both data acquisition and manual
labeling are extremely expensive in the medical domain, the network has to be
relatively lightweight for learning from a limited set of labeled multi-modal
samples. Prior art on retinal disease recognition focuses either on a single
disease or on a single modality, leaving multi-modal fusion largely
underexplored. We propose in this paper Multi-Modal Multi-Instance Learning
(MM-MIL) for selectively fusing CFP and OCT modalities. Its lightweight
architecture (as compared to current multi-head attention modules) makes it
suited for learning from relatively small-sized datasets. For an effective use
of MM-MIL, we propose to generate a pseudo sequence of CFPs by over sampling a
given CFP. The benefits of this tactic include well balancing instances across
modalities, increasing the resolution of the CFP input, and finding out regions
of the CFP most relevant with respect to the final diagnosis. Extensive
experiments on a real-world dataset consisting of 1,206 multi-modal cases from
1,193 eyes of 836 subjects demonstrate the viability of the proposed model.
Related papers
- Cross-Fundus Transformer for Multi-modal Diabetic Retinopathy Grading with Cataract [17.77175890577782]
Diabetic retinopathy (DR) is a leading cause of blindness worldwide and a common complication of diabetes.
This study explores a novel multi-modal deep learning framework to fuse the information from color fundus photography (IFP) and infrared fundus photography (IFP) towards more accurate DR grading.
arXiv Detail & Related papers (2024-11-01T16:38:49Z) - ETSCL: An Evidence Theory-Based Supervised Contrastive Learning Framework for Multi-modal Glaucoma Grading [7.188153974946432]
Glaucoma is one of the leading causes of vision impairment.
It remains challenging to extract reliable features due to the high similarity of medical images and the unbalanced multi-modal data distribution.
We propose a novel framework, namely ETSCL, which consists of a contrastive feature extraction stage and a decision-level fusion stage.
arXiv Detail & Related papers (2024-07-19T11:57:56Z) - C^2M-DoT: Cross-modal consistent multi-view medical report generation
with domain transfer network [67.97926983664676]
We propose a cross-modal consistent multi-view medical report generation with a domain transfer network (C2M-DoT)
C2M-DoT substantially outperforms state-of-the-art baselines in all metrics.
arXiv Detail & Related papers (2023-10-09T02:31:36Z) - Edge-aware Multi-task Network for Integrating Quantification
Segmentation and Uncertainty Prediction of Liver Tumor on Multi-modality
Non-contrast MRI [21.57865822575582]
This paper proposes a unified framework, namely edge-aware multi-task network (EaMtNet) to associate multi-index quantification, segmentation, and uncertainty of liver tumors.
The proposed model outperforms the state-of-the-art by a large margin, achieving a dice similarity coefficient of 90.01$pm$1.23 and a mean absolute error of 2.72$pm$0.58 mm for MD.
arXiv Detail & Related papers (2023-07-04T16:08:18Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z) - Affinity Feature Strengthening for Accurate, Complete and Robust Vessel
Segmentation [48.638327652506284]
Vessel segmentation is crucial in many medical image applications, such as detecting coronary stenoses, retinal vessel diseases and brain aneurysms.
We present a novel approach, the affinity feature strengthening network (AFN), which jointly models geometry and refines pixel-wise segmentation features using a contrast-insensitive, multiscale affinity approach.
arXiv Detail & Related papers (2022-11-12T05:39:17Z) - Multi-objective optimization determines when, which and how to fuse deep
networks: an application to predict COVID-19 outcomes [1.8351254916713304]
We present a novel approach to optimize the setup of a multimodal end-to-end model.
We test our method on the AIforCOVID dataset, attaining state-of-the-art results.
arXiv Detail & Related papers (2022-04-07T23:07:33Z) - Cross-Modality Deep Feature Learning for Brain Tumor Segmentation [158.8192041981564]
This paper proposes a novel cross-modality deep feature learning framework to segment brain tumors from the multi-modality MRI data.
The core idea is to mine rich patterns across the multi-modality data to make up for the insufficient data scale.
Comprehensive experiments are conducted on the BraTS benchmarks, which show that the proposed cross-modality deep feature learning framework can effectively improve the brain tumor segmentation performance.
arXiv Detail & Related papers (2022-01-07T07:46:01Z) - Modality Completion via Gaussian Process Prior Variational Autoencoders
for Multi-Modal Glioma Segmentation [75.58395328700821]
We propose a novel model, Multi-modal Gaussian Process Prior Variational Autoencoder (MGP-VAE), to impute one or more missing sub-modalities for a patient scan.
MGP-VAE can leverage the Gaussian Process (GP) prior on the Variational Autoencoder (VAE) to utilize the subjects/patients and sub-modalities correlations.
We show the applicability of MGP-VAE on brain tumor segmentation where either, two, or three of four sub-modalities may be missing.
arXiv Detail & Related papers (2021-07-07T19:06:34Z) - Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for
Thoracic Disease Identification [83.6017225363714]
deep learning has become the most powerful computer-aided diagnosis technology for improving disease identification performance.
For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming.
In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods to improve a single model's disease identification performance.
arXiv Detail & Related papers (2021-02-26T02:29:30Z) - Max-Fusion U-Net for Multi-Modal Pathology Segmentation with Attention
and Dynamic Resampling [13.542898009730804]
The performance of relevant algorithms is significantly affected by the proper fusion of the multi-modal information.
We present the Max-Fusion U-Net that achieves improved pathology segmentation performance.
We evaluate our methods using the Myocardial pathology segmentation (MyoPS) combining the multi-sequence CMR dataset.
arXiv Detail & Related papers (2020-09-05T17:24:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.