An Enhancement of CNN Algorithm for Rice Leaf Disease Image Classification in Mobile Applications
- URL: http://arxiv.org/abs/2412.07182v2
- Date: Sun, 16 Feb 2025 12:20:40 GMT
- Title: An Enhancement of CNN Algorithm for Rice Leaf Disease Image Classification in Mobile Applications
- Authors: Kayne Uriel K. Rodrigo, Jerriane Hillary Heart S. Marcial, Samuel C. Brillo, Khatalyn E. Mata, Jonathan C. Morano,
- Abstract summary: This study focuses on enhancing rice leaf disease image classification algorithms, which have traditionally relied on Convolutional Neural Network (CNN) models.<n>We employed transfer learning with MobileViTV2_050 using ImageNet-1k weights, a lightweight model that integrates CNN's local feature extraction with Vision Transformers' global context learning.<n>Our approach resulted in a significant 15.66% improvement in classification accuracy for MobileViTV2_050-A, our first enhanced model trained on the baseline dataset, achieving 93.14%.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This study focuses on enhancing rice leaf disease image classification algorithms, which have traditionally relied on Convolutional Neural Network (CNN) models. We employed transfer learning with MobileViTV2_050 using ImageNet-1k weights, a lightweight model that integrates CNN's local feature extraction with Vision Transformers' global context learning through a separable self-attention mechanism. Our approach resulted in a significant 15.66% improvement in classification accuracy for MobileViTV2_050-A, our first enhanced model trained on the baseline dataset, achieving 93.14%. Furthermore, MobileViTV2_050-B, our second enhanced model trained on a broader rice leaf dataset, demonstrated a 22.12% improvement, reaching 99.6% test accuracy. Additionally, MobileViTV2-A attained an F1-score of 93% across four rice labels and a Receiver Operating Characteristic (ROC) curve ranging from 87% to 97%. In terms of resource consumption, our enhanced models reduced the total parameters of the baseline CNN model by up to 92.50%, from 14 million to 1.1 million. These results indicate that MobileViTV2_050 not only improves computational efficiency through its separable self-attention mechanism but also enhances global context learning. Consequently, it offers a lightweight and robust solution suitable for mobile deployment, advancing the interpretability and practicality of models in precision agriculture.
Related papers
- Enhanced Leukemic Cell Classification Using Attention-Based CNN and Data Augmentation [0.5371337604556311]
Acute lymphoblastic leukemia (ALL) is the most common childhood cancer, requiring expert microscopic diagnosis.<n>The proposed system integrates an attention-based convolutional neural network combining EfficientNetV2-B3 with Squeeze-and-Excitation mechanisms for automated ALL cell classification.<n>Our approach employs comprehensive data augmentation, focal loss for class imbalance, and patient-wise data splitting to ensure robust and reproducible evaluation.
arXiv Detail & Related papers (2026-01-03T01:24:11Z) - A Domain-Adapted Lightweight Ensemble for Resource-Efficient Few-Shot Plant Disease Classification [0.0]
We present a few-shot learning approach that combines domain-adapted MobileNetV2 and MobileNetV3 models as feature extractors.<n>For the classification task, the fused features are passed through a Bi-LSTM classifier enhanced with attention mechanisms.<n>It consistently improved performance across 1 to 15 shot scenarios, reaching 98.23+-0.33% at 15 shot.<n> Notably, it also outperformed the previous SOTA accuracy of 96.4% on six diseases from PlantVillage, achieving 99.72% with only 15-shot learning.
arXiv Detail & Related papers (2025-12-15T15:17:29Z) - VLA Models Are More Generalizable Than You Think: Revisiting Physical and Spatial Modeling [60.341503853471494]
We show that vision-language-action models degrade sharply under novel camera viewpoints and visual perturbations.<n>We propose a one-shot adaptation framework that recalibrates visual representations through lightweight, learnable updates.
arXiv Detail & Related papers (2025-12-02T16:16:13Z) - Transfer Learning-Based CNN Models for Plant Species Identification Using Leaf Venation Patterns [0.0]
This study evaluates the efficacy of three deep learning architectures: ResNet50, MobileNetV2, and EfficientNetB0 for automated plant species classification based on leaf venation patterns.
arXiv Detail & Related papers (2025-09-03T21:23:09Z) - Skin Cancer Classification: Hybrid CNN-Transformer Models with KAN-Based Fusion [0.0]
We explore Sequential and Parallel Hybrid CNN-Transformer models with Convolutional Kolmogorov-Arnold Network (CKAN)<n>Our approach integrates transfer learning and extensive data augmentation, where CNNs extract local spatial features, Transformers model global dependencies, and CKAN facilitates nonlinear feature fusion for improved representation learning.<n>Our proposed approach achieves competitive performance in skin cancer classification, demonstrating 92.81% accuracy and 92.47% F1-score on the HAM10000 dataset, 97.83% accuracy and 97.83% F1-score on the PAD-UFES dataset, and 91.17% accuracy with 91.79% F1- score on
arXiv Detail & Related papers (2025-08-17T19:57:34Z) - Remote Sensing Image Classification Using Convolutional Neural Network (CNN) and Transfer Learning Techniques [1.024113475677323]
This study investigates the classification of aerial images depicting transmission towers, forests, farmland, and mountains.
To complete the classification job, features are extracted from input photos using a Convolutional Neural Network (CNN) architecture.
Our study shows that transfer learning models and MobileNetV2 in particular, work well for landscape categorization.
arXiv Detail & Related papers (2025-03-04T11:19:18Z) - Malware Classification from Memory Dumps Using Machine Learning, Transformers, and Large Language Models [1.038088229789127]
This study investigates the performance of various classification models for a malware classification task using different feature sets and data configurations.
XGB achieved the highest accuracy of 87.42% using the Top 45 Features, outperforming all other models.
Deep learning models underperformed, with RNN achieving 66.71% accuracy and Transformers reaching 71.59%.
arXiv Detail & Related papers (2025-03-04T00:24:21Z) - Enhancing Grammatical Error Detection using BERT with Cleaned Lang-8 Dataset [0.0]
This paper presents an improved LLM based model for Grammatical Error Detection (GED)
Traditional approach to GED involved hand-designed features, but recently, Neural Networks (NN) have automated the discovery of these features.
BERT-base-uncased model gave an impressive performance with an F1 score of 0.91 and accuracy of 98.49% on training data.
arXiv Detail & Related papers (2024-11-23T10:57:41Z) - Analysis of Convolutional Neural Network-based Image Classifications: A Multi-Featured Application for Rice Leaf Disease Prediction and Recommendations for Farmers [0.0]
This study presents a novel method for improving rice disease classification using 8 different convolutional neural network (CNN) algorithms.
With the help of this cutting-edge application, farmers will be able to make timely and well-informed decisions.
Remarkable outcomes include 75% accuracy for ResNet-50, 90% accuracy for DenseNet121, 84% accuracy for VGG16, 95.83% accuracy for MobileNetV2, 91.61% accuracy for DenseNet169, and 86% accuracy for InceptionV3.
arXiv Detail & Related papers (2024-09-17T05:32:01Z) - An Augmentation-based Model Re-adaptation Framework for Robust Image Segmentation [0.799543372823325]
We propose an Augmentation-based Model Re-adaptation Framework (AMRF) to enhance the generalisation of segmentation models.
By observing segmentation masks from conventional models (FCN and U-Net) and a pre-trained SAM model, we determine a minimal augmentation set that optimally balances training efficiency and model performance.
Our results demonstrate that the fine-tuned FCN surpasses its baseline by 3.29% and 3.02% in cropping accuracy, and 5.27% and 4.04% in classification accuracy on two temporally continuous datasets.
arXiv Detail & Related papers (2024-09-14T21:01:49Z) - Onboard Satellite Image Classification for Earth Observation: A Comparative Study of ViT Models [28.69148416385582]
This study focuses on identifying the most effective pre-trained model for land use classification in onboard satellite processing.
We compare the performance of traditional CNN-based, ResNet-based, and various pre-trained vision Transformer models.
Our findings demonstrate that pre-trained Vision Transformer models, particularly MobileViTV2 and EfficientViT-M2, outperform models trained from scratch in terms of accuracy and efficiency.
arXiv Detail & Related papers (2024-09-05T20:21:49Z) - Building Math Agents with Multi-Turn Iterative Preference Learning [56.71330214021884]
This paper studies the complementary direct preference learning approach to further improve model performance.
Existing direct preference learning algorithms are originally designed for the single-turn chat task.
We introduce a multi-turn direct preference learning framework, tailored for this context.
arXiv Detail & Related papers (2024-09-04T02:41:04Z) - Enhanced Infield Agriculture with Interpretable Machine Learning Approaches for Crop Classification [0.49110747024865004]
This research evaluates four different approaches for crop classification, namely traditional ML with handcrafted feature extraction methods like SIFT, ORB, and Color Histogram; Custom Designed CNN and established DL architecture like AlexNet; transfer learning on five models pre-trained using ImageNet.
Xception outperformed all of them in terms of generalization, achieving 98% accuracy on the test data, with a model size of 80.03 MB and a prediction time of 0.0633 seconds.
arXiv Detail & Related papers (2024-08-22T14:20:34Z) - Systematic Architectural Design of Scale Transformed Attention Condenser
DNNs via Multi-Scale Class Representational Response Similarity Analysis [93.0013343535411]
We propose a novel type of analysis called Multi-Scale Class Representational Response Similarity Analysis (ClassRepSim)
We show that adding STAC modules to ResNet style architectures can result in up to a 1.6% increase in top-1 accuracy.
Results from ClassRepSim analysis can be used to select an effective parameterization of the STAC module resulting in competitive performance.
arXiv Detail & Related papers (2023-06-16T18:29:26Z) - Lightweight Vision Transformer with Bidirectional Interaction [59.39874544410419]
We propose a Fully Adaptive Self-Attention (FASA) mechanism for vision transformer to model the local and global information.
Based on FASA, we develop a family of lightweight vision backbones, Fully Adaptive Transformer (FAT) family.
arXiv Detail & Related papers (2023-06-01T06:56:41Z) - Revisiting Classifier: Transferring Vision-Language Models for Video
Recognition [102.93524173258487]
Transferring knowledge from task-agnostic pre-trained deep models for downstream tasks is an important topic in computer vision research.
In this study, we focus on transferring knowledge for video classification tasks.
We utilize the well-pretrained language model to generate good semantic target for efficient transferring learning.
arXiv Detail & Related papers (2022-07-04T10:00:47Z) - EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for
Mobile Vision Applications [68.35683849098105]
We introduce split depth-wise transpose attention (SDTA) encoder that splits input tensors into multiple channel groups.
Our EdgeNeXt model with 1.3M parameters achieves 71.2% top-1 accuracy on ImageNet-1K.
Our EdgeNeXt model with 5.6M parameters achieves 79.4% top-1 accuracy on ImageNet-1K.
arXiv Detail & Related papers (2022-06-21T17:59:56Z) - VOLO: Vision Outlooker for Visual Recognition [148.12522298731807]
Vision transformers (ViTs) have shown great potential of self-attention based models in ImageNet classification.
We introduce a novel outlook attention and present a simple and general architecture, termed Vision Outlooker (VOLO)
Unlike self-attention that focuses on global dependency modeling at a coarse level, the outlook attention efficiently encodes finer-level features and contexts into tokens.
Experiments show that our VOLO achieves 87.1% top-1 accuracy on ImageNet-1K classification, which is the first model exceeding 87% accuracy on this competitive benchmark.
arXiv Detail & Related papers (2021-06-24T15:46:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.