nnMobileNet: Rethinking CNN for Retinopathy Research
- URL: http://arxiv.org/abs/2306.01289v4
- Date: Mon, 15 Apr 2024 20:35:03 GMT
- Title: nnMobileNet: Rethinking CNN for Retinopathy Research
- Authors: Wenhui Zhu, Peijie Qiu, Xiwen Chen, Xin Li, Natasha Lepore, Oana M. Dumitrascu, Yalin Wang,
- Abstract summary: convolutional neural networks (CNNs) have been at the forefront of the detection and tracking of various retinal diseases (RD)
The emergence of vision transformers (ViT) in the 2020s has shifted the trajectory of RD model development.
We revisited and updated the architecture of a CNN model, specifically MobileNet, to enhance its utility in RD diagnostics.
- Score: 4.882524311496886
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Over the past few decades, convolutional neural networks (CNNs) have been at the forefront of the detection and tracking of various retinal diseases (RD). Despite their success, the emergence of vision transformers (ViT) in the 2020s has shifted the trajectory of RD model development. The leading-edge performance of ViT-based models in RD can be largely credited to their scalability-their ability to improve as more parameters are added. As a result, ViT-based models tend to outshine traditional CNNs in RD applications, albeit at the cost of increased data and computational demands. ViTs also differ from CNNs in their approach to processing images, working with patches rather than local regions, which can complicate the precise localization of small, variably presented lesions in RD. In our study, we revisited and updated the architecture of a CNN model, specifically MobileNet, to enhance its utility in RD diagnostics. We found that an optimized MobileNet, through selective modifications, can surpass ViT-based models in various RD benchmarks, including diabetic retinopathy grading, detection of multiple fundus diseases, and classification of diabetic macular edema. The code is available at https://github.com/Retinal-Research/NN-MOBILENET
Related papers
- Synthetic Trajectory Generation Through Convolutional Neural Networks [6.717469146587211]
We introduce a Reversible Trajectory-to-CNN Transformation (RTCT)
RTCT adapts trajectories into a format suitable for CNN-based models.
We evaluate its performance against an RNN-based trajectory GAN.
arXiv Detail & Related papers (2024-07-24T02:16:52Z) - A Comparative Study of CNN, ResNet, and Vision Transformers for Multi-Classification of Chest Diseases [0.0]
Vision Transformers (ViT) are powerful tools due to their scalability and ability to process large amounts of data.
We fine-tuned two variants of ViT models, one pre-trained on ImageNet and another trained from scratch, using the NIH Chest X-ray dataset.
Our study evaluates the performance of these models in the multi-label classification of 14 distinct diseases.
arXiv Detail & Related papers (2024-05-31T23:56:42Z) - Unveiling the Unseen: Identifiable Clusters in Trained Depthwise
Convolutional Kernels [56.69755544814834]
Recent advances in depthwise-separable convolutional neural networks (DS-CNNs) have led to novel architectures.
This paper reveals another striking property of DS-CNN architectures: discernible and explainable patterns emerge in their trained depthwise convolutional kernels in all layers.
arXiv Detail & Related papers (2024-01-25T19:05:53Z) - InternImage: Exploring Large-Scale Vision Foundation Models with
Deformable Convolutions [95.94629864981091]
This work presents a new large-scale CNN-based foundation model, termed InternImage, which can obtain the gain from increasing parameters and training data like ViTs.
The proposed InternImage reduces the strict inductive bias of traditional CNNs and makes it possible to learn stronger and more robust patterns with large-scale parameters from massive data like ViTs.
arXiv Detail & Related papers (2022-11-10T18:59:04Z) - Data-Efficient Vision Transformers for Multi-Label Disease
Classification on Chest Radiographs [55.78588835407174]
Vision Transformers (ViTs) have not been applied to this task despite their high classification performance on generic images.
ViTs do not rely on convolutions but on patch-based self-attention and in contrast to CNNs, no prior knowledge of local connectivity is present.
Our results show that while the performance between ViTs and CNNs is on par with a small benefit for ViTs, DeiTs outperform the former if a reasonably large data set is available for training.
arXiv Detail & Related papers (2022-08-17T09:07:45Z) - Training High-Performance Low-Latency Spiking Neural Networks by
Differentiation on Spike Representation [70.75043144299168]
Spiking Neural Network (SNN) is a promising energy-efficient AI model when implemented on neuromorphic hardware.
It is a challenge to efficiently train SNNs due to their non-differentiability.
We propose the Differentiation on Spike Representation (DSR) method, which could achieve high performance.
arXiv Detail & Related papers (2022-05-01T12:44:49Z) - ROOD-MRI: Benchmarking the robustness of deep learning segmentation
models to out-of-distribution and corrupted data in MRI [0.4839993770067469]
ROOD-MRI is a platform for benchmarking the robustness of deep artificial neural networks to MRI data, corruptions, and artifacts.
We apply our methodology to hippocampus, ventricle, and white matter hyperintensity segmentation in several large studies.
We show that while data augmentation strategies can substantially improve robustness to OOD data for anatomical segmentation tasks, modern DNNs using augmentation still lack robustness in more challenging lesion-based segmentation tasks.
arXiv Detail & Related papers (2022-03-11T16:34:15Z) - Supervised Training of Siamese Spiking Neural Networks with Earth's
Mover Distance [4.047840018793636]
This study adapts the highly-versatile siamese neural network model to the event data domain.
We introduce a supervised training framework for optimizing Earth's Mover Distance between spike trains with spiking neural networks (SNN)
arXiv Detail & Related papers (2022-02-20T00:27:57Z) - Scopeformer: n-CNN-ViT Hybrid Model for Intracranial Hemorrhage
Classification [0.0]
We propose a feature generator composed of an ensemble of convolutional neuralnetworks (CNNs) to improve the Vision Transformer (ViT) models.
We show that by gradually stacking several feature maps extracted using multiple Xception CNNs, we can develop a feature-rich input for the ViT model.
arXiv Detail & Related papers (2021-07-07T20:20:24Z) - RetiNerveNet: Using Recursive Deep Learning to Estimate Pointwise 24-2
Visual Field Data based on Retinal Structure [109.33721060718392]
glaucoma is the leading cause of irreversible blindness in the world, affecting over 70 million people.
Due to the Standard Automated Perimetry (SAP) test's innate difficulty and its high test-retest variability, we propose the RetiNerveNet.
arXiv Detail & Related papers (2020-10-15T03:09:08Z) - Classification of COVID-19 in CT Scans using Multi-Source Transfer
Learning [91.3755431537592]
We propose the use of Multi-Source Transfer Learning to improve upon traditional Transfer Learning for the classification of COVID-19 from CT scans.
With our multi-source fine-tuning approach, our models outperformed baseline models fine-tuned with ImageNet.
Our best performing model was able to achieve an accuracy of 0.893 and a Recall score of 0.897, outperforming its baseline Recall score by 9.3%.
arXiv Detail & Related papers (2020-09-22T11:53:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.