Cross-Field Transformer for Diabetic Retinopathy Grading on Two-feld
Fundus Images
- URL: http://arxiv.org/abs/2211.14552v1
- Date: Sat, 26 Nov 2022 12:39:57 GMT
- Title: Cross-Field Transformer for Diabetic Retinopathy Grading on Two-feld
Fundus Images
- Authors: Junlin Hou, Jilan Xu, Fan Xiao, Rui-Wei Zhao, Yuejie Zhang, Haidong
Zou, Lina Lu, Wenwen Xue, Rui Feng
- Abstract summary: We first construct a new benchmark dataset (DRTiD) for DR grading, consisting of 3,100 two-feld fundus images.
Then, we propose a novel DR grading approach, namely Cross-Field Transformer (CrossFiT) to capture the correspondence between two felds.
- Score: 9.211425049275798
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automatic diabetic retinopathy (DR) grading based on fundus photography has
been widely explored to benefit the routine screening and early treatment.
Existing researches generally focus on single-feld fundus images, which have
limited field of view for precise eye examinations. In clinical applications,
ophthalmologists adopt two-feld fundus photography as the dominating tool,
where the information from each feld (i.e.,macula-centric and optic
disc-centric) is highly correlated and complementary, and benefits
comprehensive decisions. However, automatic DR grading based on two-feld fundus
photography remains a challenging task due to the lack of publicly available
datasets and effective fusion strategies. In this work, we first construct a
new benchmark dataset (DRTiD) for DR grading, consisting of 3,100 two-feld
fundus images. To the best of our knowledge, it is the largest public DR
dataset with diverse and high-quality two-feld images. Then, we propose a novel
DR grading approach, namely Cross-Field Transformer (CrossFiT), to capture the
correspondence between two felds as well as the long-range spatial correlations
within each feld. Considering the inherent two-feld geometric constraints, we
particularly define aligned position embeddings to preserve relative consistent
position in fundus. Besides, we perform masked cross-field attention during
interaction to flter the noisy relations between fields. Extensive experiments
on our DRTiD dataset and a public DeepDRiD dataset demonstrate the
effectiveness of our CrossFiT network. The new dataset and the source code of
CrossFiT will be publicly available at https://github.com/FDU-VTS/DRTiD.
Related papers
- Cross-Fundus Transformer for Multi-modal Diabetic Retinopathy Grading with Cataract [17.77175890577782]
Diabetic retinopathy (DR) is a leading cause of blindness worldwide and a common complication of diabetes.
This study explores a novel multi-modal deep learning framework to fuse the information from color fundus photography (IFP) and infrared fundus photography (IFP) towards more accurate DR grading.
arXiv Detail & Related papers (2024-11-01T16:38:49Z) - Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning [65.54680361074882]
Eye-gaze Guided Multi-modal Alignment (EGMA) framework harnesses eye-gaze data for better alignment of medical visual and textual features.
We conduct downstream tasks of image classification and image-text retrieval on four medical datasets.
arXiv Detail & Related papers (2024-03-19T03:59:14Z) - SDR-Former: A Siamese Dual-Resolution Transformer for Liver Lesion
Classification Using 3D Multi-Phase Imaging [59.78761085714715]
This study proposes a novel Siamese Dual-Resolution Transformer (SDR-Former) framework for liver lesion classification.
The proposed framework has been validated through comprehensive experiments on two clinical datasets.
To support the scientific community, we are releasing our extensive multi-phase MR dataset for liver lesion analysis to the public.
arXiv Detail & Related papers (2024-02-27T06:32:56Z) - Leveraging Multimodal Fusion for Enhanced Diagnosis of Multiple Retinal
Diseases in Ultra-wide OCTA [4.741967726600469]
We have curated the pioneering M3 OCTA dataset, which is the first multimodal, multi-disease, and widest field-of-view UW- OCTA dataset.
We propose the first cross-modal fusion framework that leverages multi-modal information for diagnosing multiple diseases.
The construction of the M3 OCTA dataset aims to advance research in the ophthalmic image analysis community.
arXiv Detail & Related papers (2023-11-17T05:23:57Z) - Source-free Active Domain Adaptation for Diabetic Retinopathy Grading
Based on Ultra-wide-field Fundus Image [4.679304803181914]
Domain adaptation (DA) has been widely applied in the diabetic retinopathy (DR) grading of unannotated ultra-wide-field (UWF) fundus images.
We propose a novel source-free active domain adaptation (SFADA) in this paper to tackle the DR grading problem itself.
Our proposed SFADA achieves state-of-the-art DR grading performance, increasing accuracy by 20.9% and quadratic weighted kappa by 18.63%.
arXiv Detail & Related papers (2023-09-19T13:52:06Z) - Affinity Feature Strengthening for Accurate, Complete and Robust Vessel
Segmentation [48.638327652506284]
Vessel segmentation is crucial in many medical image applications, such as detecting coronary stenoses, retinal vessel diseases and brain aneurysms.
We present a novel approach, the affinity feature strengthening network (AFN), which jointly models geometry and refines pixel-wise segmentation features using a contrast-insensitive, multiscale affinity approach.
arXiv Detail & Related papers (2022-11-12T05:39:17Z) - Multimodal Information Fusion for Glaucoma and DR Classification [1.5616442980374279]
Multimodal information is frequently available in medical tasks. By combining information from multiple sources, clinicians are able to make more accurate judgments.
Our paper investigates three multimodal information fusion strategies based on deep learning to solve retinal analysis tasks.
arXiv Detail & Related papers (2022-09-02T12:19:03Z) - AlignTransformer: Hierarchical Alignment of Visual Regions and Disease
Tags for Medical Report Generation [50.21065317817769]
We propose an AlignTransformer framework, which includes the Align Hierarchical Attention (AHA) and the Multi-Grained Transformer (MGT) modules.
Experiments on the public IU-Xray and MIMIC-CXR datasets show that the AlignTransformer can achieve results competitive with state-of-the-art methods on the two datasets.
arXiv Detail & Related papers (2022-03-18T13:43:53Z) - DRDrV3: Complete Lesion Detection in Fundus Images Using Mask R-CNN,
Transfer Learning, and LSTM [2.9360071145551068]
We propose a new lesion detection architecture, comprising of two sub-modules, which is an optimal solution to detect and find lesions caused by Diabetic Retinopathy (DR)
We also use two popular evaluation criteria to evaluate the outputs of our models, which are intersection over union (IOU) and mean average precision (mAP)
We hypothesize that this new solution enables specialists to detect lesions with high confidence and estimate the severity of the damage with high accuracy.
arXiv Detail & Related papers (2021-08-18T11:36:37Z) - FetReg: Placental Vessel Segmentation and Registration in Fetoscopy
Challenge Dataset [57.30136148318641]
Fetoscopy laser photocoagulation is a widely used procedure for the treatment of Twin-to-Twin Transfusion Syndrome (TTTS)
This may lead to increased procedural time and incomplete ablation, resulting in persistent TTTS.
Computer-assisted intervention may help overcome these challenges by expanding the fetoscopic field of view through video mosaicking and providing better visualization of the vessel network.
We present a large-scale multi-centre dataset for the development of generalized and robust semantic segmentation and video mosaicking algorithms for the fetal environment with a focus on creating drift-free mosaics from long duration fetoscopy videos.
arXiv Detail & Related papers (2021-06-10T17:14:27Z) - Fader Networks for domain adaptation on fMRI: ABIDE-II study [68.5481471934606]
We use 3D convolutional autoencoders to build the domain irrelevant latent space image representation and demonstrate this method to outperform existing approaches on ABIDE data.
arXiv Detail & Related papers (2020-10-14T16:50:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.