Leveraging Complementary Attention maps in vision transformers for OCT image analysis
- URL: http://arxiv.org/abs/2310.14005v3
- Date: Sat, 31 May 2025 01:46:46 GMT
- Title: Leveraging Complementary Attention maps in vision transformers for OCT image analysis
- Authors: Haz Sameen Shahgir, Tanjeem Azwad Zaman, Khondker Salman Sayeed, Md. Asif Haider, Sheikh Saifur Rahman Jony, M. Sohel Rahman,
- Abstract summary: We outline our new state-of-the-art pipeline for identifying biomarkers from OCT scans.<n>We evaluate different convolution and attention mechanisms for biomarker detection.<n>We ensemble the predictions of both models to achieve first place in the IEEE Video and Image Processing Cup 2023 competition.
- Score: 3.1487473474617125
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Optical Coherence Tomography (OCT) scan yields all possible cross-section images of a retina for detecting biomarkers linked to optical defects. Due to the high volume of data generated, an automated and reliable biomarker detection pipeline is necessary as a primary screening stage. We outline our new state-of-the-art pipeline for identifying biomarkers from OCT scans. In collaboration with trained ophthalmologists, we identify local and global structures in biomarkers. Through a comprehensive and systematic review of existing vision architectures, we evaluate different convolution and attention mechanisms for biomarker detection. We find that MaxViT, a hybrid vision transformer combining convolution layers with strided attention, is better suited for local feature detection, while EVA-02, a standard vision transformer leveraging pure attention and large-scale knowledge distillation, excels at capturing global features. We ensemble the predictions of both models to achieve first place in the IEEE Video and Image Processing Cup 2023 competition on OCT biomarker detection, achieving a patient-wise F1 score of 0.8527 in the final phase of the competition, scoring 3.8\% higher than the next best solution. Finally, we used knowledge distillation to train a single MaxViT to outperform our ensemble at a fraction of the computation cost.
Related papers
- Alzheimer's Disease Classification Using Retinal OCT: TransnetOCT and Swin Transformer Models [2.474908349649168]
This work utilizes advanced deep learning techniques to classify retinal OCT images of subjects with Alzheimer's disease (AD) and healthy controls (CO)<n>The best classification architecture is TransNet OCT, which has an average accuracy of 98.18% for input OCT images and 98.91% for segmented OCT images for five-fold cross-validation compared to other models.
arXiv Detail & Related papers (2025-03-14T15:34:37Z) - MaxGlaViT: A novel lightweight vision transformer-based approach for early diagnosis of glaucoma stages from fundus images [0.0]
This study introduces MaxGlaViT, a lightweight model based on the restructured Multi-Axis Vision Transformer (MaxViT) for early glaucoma detection.
The model was evaluated using the HDV1 dataset, containing fundus images of different glaucoma stages.
MaxGlaViT outperforms experimental and state-of-the-art models, achieving 92.03% accuracy, 92.33% precision, 92.03% recall, 92.13% f1-score, and 87.12% Cohen's kappa score.
arXiv Detail & Related papers (2025-02-24T13:48:04Z) - Multi-Class Abnormality Classification Task in Video Capsule Endoscopy [3.656114607436271]
This work addressed the challenge of multiclass anomaly classification in video capsule Endoscopy (VCE)
The purpose is to correctly classify diverse gastrointestinal disorders, which is critical for increasing diagnostic efficiency in clinical settings.
Our team capsule commandos achieved 7th place ranking with a test set[7] performance of Mean AUC: 0.7314 and balanced accuracy: 0.3235.
arXiv Detail & Related papers (2024-10-25T21:22:52Z) - Ophthalmic Biomarker Detection with Parallel Prediction of Transformer and Convolutional Architecture [1.6893691730575022]
This paper presents a novel approach for ophthalmic biomarker detection using an ensemble of Convolutional Neural Network (CNN) and Vision Transformer.
Our method has been implemented on the OLIVES dataset to detect 6 major biomarkers from the OCT images and shows significant improvement of the macro averaged F1 score on the dataset.
arXiv Detail & Related papers (2024-09-26T12:33:34Z) - Classification of Endoscopy and Video Capsule Images using CNN-Transformer Model [1.0994755279455526]
This study proposes a hybrid model that combines the advantages of Transformers and Convolutional Neural Networks (CNNs) to enhance classification performance.
For the GastroVision dataset, our proposed model demonstrates excellent performance with Precision, Recall, F1 score, Accuracy, and Matthews Correlation Coefficient (MCC) of 0.8320, 0.8386, 0.8324, 0.8386, and 0.8191, respectively.
arXiv Detail & Related papers (2024-08-20T11:05:32Z) - Domain-specific augmentations with resolution agnostic self-attention mechanism improves choroid segmentation in optical coherence tomography images [3.8485899972356337]
The choroid is a key vascular layer of the eye, supplying oxygen to the retinal photoreceptors.
Current methods to measure the choroid often require use of multiple, independent semi-automatic and deep learning-based algorithms.
We propose a Robust, Resolution-agnostic and Efficient Attention-based network for CHoroid segmentation (REACH)
arXiv Detail & Related papers (2024-05-23T11:35:23Z) - Analysis of the BraTS 2023 Intracranial Meningioma Segmentation Challenge [44.76736949127792]
We describe the design and results from the BraTS 2023 Intracranial Meningioma Challenge.
The BraTS Meningioma Challenge differed from prior BraTS Glioma challenges in that it focused on meningiomas.
The top ranked team had a lesion-wise median dice similarity coefficient (DSC) of 0.976, 0.976, and 0.964 for enhancing tumor, tumor core, and whole tumor.
arXiv Detail & Related papers (2024-05-16T03:23:57Z) - Deep Learning for Vascular Segmentation and Applications in Phase
Contrast Tomography Imaging [33.23991248643144]
We present a thorough literature review, highlighting the state of machine learning techniques across diverse organs.
Our goal is to provide a foundation on the topic and identify a robust baseline model for application to vascular segmentation in a new imaging modality.
HiP CT enables 3D imaging of complete organs at an unprecedented resolution of ca. 20mm per voxel.
arXiv Detail & Related papers (2023-11-22T11:15:38Z) - A Federated Learning Framework for Stenosis Detection [70.27581181445329]
This study explores the use of Federated Learning (FL) for stenosis detection in coronary angiography images (CA)
Two heterogeneous datasets from two institutions were considered: dataset 1 includes 1219 images from 200 patients, which we acquired at the Ospedale Riuniti of Ancona (Italy)
dataset 2 includes 7492 sequential images from 90 patients from a previous study available in the literature.
arXiv Detail & Related papers (2023-10-30T11:13:40Z) - Affine-Consistent Transformer for Multi-Class Cell Nuclei Detection [76.11864242047074]
We propose a novel Affine-Consistent Transformer (AC-Former), which directly yields a sequence of nucleus positions.
We introduce an Adaptive Affine Transformer (AAT) module, which can automatically learn the key spatial transformations to warp original images for local network training.
Experimental results demonstrate that the proposed method significantly outperforms existing state-of-the-art algorithms on various benchmarks.
arXiv Detail & Related papers (2023-10-22T02:27:02Z) - COVID-19 detection using ViT transformer-based approach from Computed
Tomography Images [0.0]
We introduce a novel approach to enhance the accuracy and efficiency of COVID-19 diagnosis using CT images.
We employ the base ViT Transformer configured for 224x224-sized input images, modifying the output to suit the binary classification task.
Our method implements a systematic patient-level prediction strategy, classifying individual CT slices as COVID-19 or non-COVID.
arXiv Detail & Related papers (2023-10-12T09:37:56Z) - Breast Ultrasound Tumor Classification Using a Hybrid Multitask
CNN-Transformer Network [63.845552349914186]
Capturing global contextual information plays a critical role in breast ultrasound (BUS) image classification.
Vision Transformers have an improved capability of capturing global contextual information but may distort the local image patterns due to the tokenization operations.
In this study, we proposed a hybrid multitask deep neural network called Hybrid-MT-ESTAN, designed to perform BUS tumor classification and segmentation.
arXiv Detail & Related papers (2023-08-04T01:19:32Z) - A Novel Vision Transformer with Residual in Self-attention for
Biomedical Image Classification [8.92307560991779]
This article presents the novel framework of multi-head self-attention for vision transformer (ViT)
The proposed method uses the concept of residual connection for accumulating the best attention output in each block of multi-head attention.
The results show the significant improvement over traditional ViT and other convolution based state-of-the-art classification models.
arXiv Detail & Related papers (2023-06-02T15:06:14Z) - nnUNet RASPP for Retinal OCT Fluid Detection, Segmentation and
Generalisation over Variations of Data Sources [25.095695898777656]
We propose two variants of the nnUNet with consistent high performance across images from multiple device vendors.
The algorithm was validated on the MICCAI 2017 RETOUCH challenge dataset.
Experimental results show that our algorithms outperform the current state-of-the-arts algorithms.
arXiv Detail & Related papers (2023-02-25T23:47:23Z) - Reliable Joint Segmentation of Retinal Edema Lesions in OCT Images [55.83984261827332]
In this paper, we propose a novel reliable multi-scale wavelet-enhanced transformer network.
We develop a novel segmentation backbone that integrates a wavelet-enhanced feature extractor network and a multi-scale transformer module.
Our proposed method achieves better segmentation accuracy with a high degree of reliability as compared to other state-of-the-art segmentation approaches.
arXiv Detail & Related papers (2022-12-01T07:32:56Z) - Affinity Feature Strengthening for Accurate, Complete and Robust Vessel
Segmentation [48.638327652506284]
Vessel segmentation is crucial in many medical image applications, such as detecting coronary stenoses, retinal vessel diseases and brain aneurysms.
We present a novel approach, the affinity feature strengthening network (AFN), which jointly models geometry and refines pixel-wise segmentation features using a contrast-insensitive, multiscale affinity approach.
arXiv Detail & Related papers (2022-11-12T05:39:17Z) - WSSS4LUAD: Grand Challenge on Weakly-supervised Tissue Semantic
Segmentation for Lung Adenocarcinoma [51.50991881342181]
This challenge includes 10,091 patch-level annotations and over 130 million labeled pixels.
First place team achieved mIoU of 0.8413 (tumor: 0.8389, stroma: 0.7931, normal: 0.8919)
arXiv Detail & Related papers (2022-04-13T15:27:05Z) - Lymphocyte Classification in Hyperspectral Images of Ovarian Cancer
Tissue Biopsy Samples [94.37521840642141]
We present a machine learning pipeline to segment white blood cell pixels in hyperspectral images of biopsy cores.
These cells are clinically important for diagnosis, but some prior work has struggled to incorporate them due to difficulty obtaining precise pixel labels.
arXiv Detail & Related papers (2022-03-23T00:58:27Z) - Multiple Time Series Fusion Based on LSTM An Application to CAP A Phase
Classification Using EEG [56.155331323304]
Deep learning based electroencephalogram channels' feature level fusion is carried out in this work.
Channel selection, fusion, and classification procedures were optimized by two optimization algorithms.
arXiv Detail & Related papers (2021-12-18T14:17:49Z) - The Report on China-Spain Joint Clinical Testing for Rapid COVID-19 Risk
Screening by Eye-region Manifestations [59.48245489413308]
We developed and tested a COVID-19 rapid prescreening model using the eye-region images captured in China and Spain with cellphone cameras.
The performance was measured using area under receiver-operating-characteristic curve (AUC), sensitivity, specificity, accuracy, and F1.
arXiv Detail & Related papers (2021-09-18T02:28:01Z) - Vision Transformers for femur fracture classification [59.99241204074268]
The Vision Transformer (ViT) was able to correctly predict 83% of the test images.
Good results were obtained in sub-fractures with the largest and richest dataset ever.
arXiv Detail & Related papers (2021-08-07T10:12:42Z) - G-MIND: An End-to-End Multimodal Imaging-Genetics Framework for
Biomarker Identification and Disease Classification [49.53651166356737]
We propose a novel deep neural network architecture to integrate imaging and genetics data, as guided by diagnosis, that provides interpretable biomarkers.
We have evaluated our model on a population study of schizophrenia that includes two functional MRI (fMRI) paradigms and Single Nucleotide Polymorphism (SNP) data.
arXiv Detail & Related papers (2021-01-27T19:28:04Z) - Lung Nodule Classification Using Biomarkers, Volumetric Radiomics and 3D
CNNs [0.0699049312989311]
We present a hybrid algorithm to estimate lung malignancy that combines imaging biomarkers from Radiologist's annotation with image classification of CT scans.
Our algorithm employs a 3D Convolutional Neural Network (CNN) as well as a Random Forest in order to combine CT imagery with biomarker annotation and radiomic features.
We show that a model using image biomarkers alone is more accurate than one that combines biomarkers with volumetric radiomics, 3D CNNs, and semi-supervised learning.
arXiv Detail & Related papers (2020-10-19T18:57:26Z) - A Global Benchmark of Algorithms for Segmenting Late Gadolinium-Enhanced
Cardiac Magnetic Resonance Imaging [90.29017019187282]
" 2018 Left Atrium Challenge" using 154 3D LGE-MRIs, currently the world's largest cardiac LGE-MRI dataset.
Analyse of the submitted algorithms using technical and biological metrics was performed.
Results show the top method achieved a dice score of 93.2% and a mean surface to a surface distance of 0.7 mm.
arXiv Detail & Related papers (2020-04-26T08:49:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.