Learning Two-Stream CNN for Multi-Modal Age-related Macular Degeneration
Categorization
- URL: http://arxiv.org/abs/2012.01879v1
- Date: Thu, 3 Dec 2020 12:50:36 GMT
- Title: Learning Two-Stream CNN for Multi-Modal Age-related Macular Degeneration
Categorization
- Authors: Weisen Wang, Xirong Li, Zhiyan Xu, Weihong Yu, Jianchun Zhao, Dayong
Ding, Youxin Chen
- Abstract summary: Age-related Macular Degeneration (AMD) is a common macular disease among people over 50.
Previous research efforts mainly focus on AMD categorization with a single-modal input, let it be a color fundus image or an OCT image.
By contrast, we consider AMD categorization given a multi-modal input, a direction that is clinically meaningful yet mostly unexplored.
- Score: 6.023239837661721
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper tackles automated categorization of Age-related Macular
Degeneration (AMD), a common macular disease among people over 50. Previous
research efforts mainly focus on AMD categorization with a single-modal input,
let it be a color fundus image or an OCT image. By contrast, we consider AMD
categorization given a multi-modal input, a direction that is clinically
meaningful yet mostly unexplored. Contrary to the prior art that takes a
traditional approach of feature extraction plus classifier training that cannot
be jointly optimized, we opt for end-to-end multi-modal Convolutional Neural
Networks (MM-CNN). Our MM-CNN is instantiated by a two-stream CNN, with
spatially-invariant fusion to combine information from the fundus and OCT
streams. In order to visually interpret the contribution of the individual
modalities to the final prediction, we extend the class activation mapping
(CAM) technique to the multi-modal scenario. For effective training of MM-CNN,
we develop two data augmentation methods. One is GAN-based fundus / OCT image
synthesis, with our novel use of CAMs as conditional input of a high-resolution
image-to-image translation GAN. The other method is Loose Pairing, which pairs
a fundus image and an OCT image on the basis of their classes instead of eye
identities. Experiments on a clinical dataset consisting of 1,099 color fundus
images and 1,290 OCT images acquired from 1,099 distinct eyes verify the
effectiveness of the proposed solution for multi-modal AMD categorization.
Related papers
- MM-UNet: A Mixed MLP Architecture for Improved Ophthalmic Image Segmentation [3.2846676620336632]
Ophthalmic image segmentation serves as a critical foundation for ocular disease diagnosis.
Transformer-based models address these limitations but introduce substantial computational overhead.
We introduce MM-UNet, an efficient Mixed model tailored for ophthalmic image segmentation.
arXiv Detail & Related papers (2024-08-16T08:34:50Z) - Generative artificial intelligence in ophthalmology: multimodal retinal images for the diagnosis of Alzheimer's disease with convolutional neural networks [0.0]
This study aims to predict Amyloid Positron Emission Tomography (AmyloidPET) status with multimodal retinal imaging and convolutional neural networks (CNNs)
Denoising Diffusion Probabilistic Models (DDPMs) were trained to generate synthetic images.
Unimodal CNNs were pretrained on synthetic data and finetuned on real data or trained solely on real data.
arXiv Detail & Related papers (2024-06-26T10:49:26Z) - Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model [80.61157097223058]
A prevalent strategy to bolster image classification performance is through augmenting the training set with synthetic images generated by T2I models.
In this study, we scrutinize the shortcomings of both current generative and conventional data augmentation techniques.
We introduce an innovative inter-class data augmentation method known as Diff-Mix, which enriches the dataset by performing image translations between classes.
arXiv Detail & Related papers (2024-03-28T17:23:45Z) - Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images [68.42215385041114]
This paper introduces a novel lightweight multi-level adaptation and comparison framework to repurpose the CLIP model for medical anomaly detection.
Our approach integrates multiple residual adapters into the pre-trained visual encoder, enabling a stepwise enhancement of visual features across different levels.
Our experiments on medical anomaly detection benchmarks demonstrate that our method significantly surpasses current state-of-the-art models.
arXiv Detail & Related papers (2024-03-19T09:28:19Z) - Fundus-Enhanced Disease-Aware Distillation Model for Retinal Disease
Classification from OCT Images [6.72159216082989]
We propose a fundus-enhanced disease-aware distillation model for retinal disease classification from OCT images.
Our framework enhances the OCT model during training by utilizing unpaired fundus images.
Our proposed approach outperforms single-modal, multi-modal, and state-of-the-art distillation methods for retinal disease classification.
arXiv Detail & Related papers (2023-08-01T05:13:02Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z) - AMIGO: Sparse Multi-Modal Graph Transformer with Shared-Context
Processing for Representation Learning of Giga-pixel Images [53.29794593104923]
We present a novel concept of shared-context processing for whole slide histopathology images.
AMIGO uses the celluar graph within the tissue to provide a single representation for a patient.
We show that our model is strongly robust to missing information to an extent that it can achieve the same performance with as low as 20% of the data.
arXiv Detail & Related papers (2023-03-01T23:37:45Z) - Early Diagnosis of Retinal Blood Vessel Damage via Deep Learning-Powered
Collective Intelligence Models [0.3670422696827525]
The power of swarm algorithms is used to search for various combinations of convolutional, pooling, and normalization layers to provide the best model for the task.
The best TDCN model achieves an accuracy of 90.3%, AUC ROC of 0.956, and a Cohen score of 0.967.
arXiv Detail & Related papers (2022-10-17T21:38:38Z) - Harmonizing Pathological and Normal Pixels for Pseudo-healthy Synthesis [68.5287824124996]
We present a new type of discriminator, the segmentor, to accurately locate the lesions and improve the visual quality of pseudo-healthy images.
We apply the generated images into medical image enhancement and utilize the enhanced results to cope with the low contrast problem.
Comprehensive experiments on the T2 modality of BraTS demonstrate that the proposed method substantially outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2022-03-29T08:41:17Z) - A Unified Framework for Generalized Low-Shot Medical Image Segmentation
with Scarce Data [24.12765716392381]
We propose a unified framework for generalized low-shot (one- and few-shot) medical image segmentation based on distance metric learning (DML)
Via DML, the framework learns a multimodal mixture representation for each category, and performs dense predictions based on cosine distances between the pixels' deep embeddings and the category representations.
In our experiments on brain MRI and abdominal CT datasets, the proposed framework achieves superior performances for low-shot segmentation towards standard DNN-based (3D U-Net) and classical registration-based (ANTs) methods.
arXiv Detail & Related papers (2021-10-18T13:01:06Z) - Hi-Net: Hybrid-fusion Network for Multi-modal MR Image Synthesis [143.55901940771568]
We propose a novel Hybrid-fusion Network (Hi-Net) for multi-modal MR image synthesis.
In our Hi-Net, a modality-specific network is utilized to learn representations for each individual modality.
A multi-modal synthesis network is designed to densely combine the latent representation with hierarchical features from each modality.
arXiv Detail & Related papers (2020-02-11T08:26:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.