Related papers: MobilePlantViT: A Mobile-friendly Hybrid ViT for Generalized Plant Disease Image Classification

MobilePlantViT: A Mobile-friendly Hybrid ViT for Generalized Plant Disease Image Classification

URL: http://arxiv.org/abs/2503.16628v1
Date: Thu, 20 Mar 2025 18:34:02 GMT
Title: MobilePlantViT: A Mobile-friendly Hybrid ViT for Generalized Plant Disease Image Classification
Authors: Moshiur Rahman Tonmoy, Md. Mithun Hossain, Nilanjan Dey, M. F. Mridha,
Abstract summary: Plant diseases significantly threaten global food security.<n>Deep learning models have demonstrated impressive performance in plant disease identification.<n> deploying these models on mobile and edge devices remains challenging due to high computational demands and resource constraints.<n>We propose MobilePlantViT, a novel hybrid Vision Transformer (ViT) architecture designed for generalized plant disease classification.
Score: 2.0681376988193843
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Plant diseases significantly threaten global food security by reducing crop yields and undermining agricultural sustainability. AI-driven automated classification has emerged as a promising solution, with deep learning models demonstrating impressive performance in plant disease identification. However, deploying these models on mobile and edge devices remains challenging due to high computational demands and resource constraints, highlighting the need for lightweight, accurate solutions for accessible smart agriculture systems. To address this, we propose MobilePlantViT, a novel hybrid Vision Transformer (ViT) architecture designed for generalized plant disease classification, which optimizes resource efficiency while maintaining high performance. Extensive experiments across diverse plant disease datasets of varying scales show our model's effectiveness and strong generalizability, achieving test accuracies ranging from 80% to over 99%. Notably, with only 0.69 million parameters, our architecture outperforms the smallest versions of MobileViTv1 and MobileViTv2, despite their higher parameter counts. These results underscore the potential of our approach for real-world, AI-powered automated plant disease classification in sustainable and resource-efficient smart agriculture systems. All codes will be available in the GitHub repository: https://github.com/moshiurtonmoy/MobilePlantViT

Related papers

Involution-Infused DenseNet with Two-Step Compression for Resource-Efficient Plant Disease Classification [0.0]
This study proposes a two-step model compression approach integrating Weight Pruning and Knowledge Distillation.<n>The results demonstrate ResNet50s superior performance post-compression, achieving 99.55% and 98.99% accuracy on the PlantVillage and PaddyLeaf datasets.
arXiv Detail & Related papers (2025-05-31T22:43:23Z)
DS_FusionNet: Dynamic Dual-Stream Fusion with Bidirectional Knowledge Distillation for Plant Disease Recognition [5.665116885785105]
This study innovatively proposes a Dynamic Dual-Stream Fusion Network (DS_FusionNet) The network integrates a dual-backbone architecture, deformable dynamic fusion modules, and bidirectional knowledge distillation strategy. Experimental results demonstrate that DS_FusionNet achieves classification accuracies exceeding 90% using only 10% of the PlantDisease and CIFAR-10 datasets.
arXiv Detail & Related papers (2025-04-29T17:15:02Z)
Hybrid Knowledge Transfer through Attention and Logit Distillation for On-Device Vision Systems in Agricultural IoT [0.0]
This work advances real-time, energy-efficient crop monitoring in precision agriculture. It demonstrates how we can attain ViT-level diagnostic precision on edge devices.
arXiv Detail & Related papers (2025-04-21T06:56:41Z)
Smooth Handovers via Smoothed Online Learning [48.953313950521746]
We first analyze an extensive dataset from a commercial mobile network operator (MNO) in Europe with more than 40M users, to understand and reveal important features and performance impacts on HOs.<n>Our findings highlight a correlation between HO failures/delays, and the characteristics of radio cells and end-user devices.<n>We propose a realistic system model for smooth and accurate HOs that extends existing approaches by incorporating device and cell features on HO optimization.
arXiv Detail & Related papers (2025-01-14T13:16:33Z)
Automatic Fused Multimodal Deep Learning for Plant Identification [1.2289361708127877]
We introduce a pioneering multimodal DL-based approach for plant classification with automatic modality fusion.<n>Our method achieves 82.61% accuracy on 979 classes of Multimodal-PlantCLEF, surpassing state-of-the-art methods and outperforming late fusion by 10.33%.
arXiv Detail & Related papers (2024-06-03T15:43:29Z)
Generating Diverse Agricultural Data for Vision-Based Farming Applications [74.79409721178489]
This model is capable of simulating distinct growth stages of plants, diverse soil conditions, and randomized field arrangements under varying lighting conditions. Our dataset includes 12,000 images with semantic labels, offering a comprehensive resource for computer vision tasks in precision agriculture.
arXiv Detail & Related papers (2024-03-27T08:42:47Z)
Forging Vision Foundation Models for Autonomous Driving: Challenges, Methodologies, and Opportunities [59.02391344178202]
Vision foundation models (VFMs) serve as potent building blocks for a wide range of AI applications. The scarcity of comprehensive training data, the need for multi-sensor integration, and the diverse task-specific architectures pose significant obstacles to the development of VFMs. This paper delves into the critical challenge of forging VFMs tailored specifically for autonomous driving, while also outlining future directions.
arXiv Detail & Related papers (2024-01-16T01:57:24Z)
SugarViT -- Multi-objective Regression of UAV Images with Vision Transformers and Deep Label Distribution Learning Demonstrated on Disease Severity Prediction in Sugar Beet [3.2925222641796554]
This work will introduce a machine learning framework for automatized large-scale plant-specific trait annotation. We develop an efficient Vision Transformer based model for disease severity scoring called SugarViT. Although the model is evaluated on this special use case, it is held as generic as possible to also be applicable to various image-based classification and regression tasks.
arXiv Detail & Related papers (2023-11-06T13:01:17Z)
Filling the Missing: Exploring Generative AI for Enhanced Federated Learning over Heterogeneous Mobile Edge Devices [72.61177465035031]
We propose a generative AI-empowered federated learning to address these challenges by leveraging the idea of FIlling the MIssing (FIMI) portion of local data. Experiment results demonstrate that FIMI can save up to 50% of the device-side energy to achieve the target global test accuracy.
arXiv Detail & Related papers (2023-10-21T12:07:04Z)
AMIGO: Sparse Multi-Modal Graph Transformer with Shared-Context Processing for Representation Learning of Giga-pixel Images [53.29794593104923]
We present a novel concept of shared-context processing for whole slide histopathology images. AMIGO uses the celluar graph within the tissue to provide a single representation for a patient. We show that our model is strongly robust to missing information to an extent that it can achieve the same performance with as low as 20% of the data.
arXiv Detail & Related papers (2023-03-01T23:37:45Z)
Explainable vision transformer enabled convolutional neural network for plant disease identification: PlantXViT [11.623005206620498]
Plant diseases are the primary cause of crop losses globally, with an impact on the world economy. In this study, a Vision Transformer enabled Convolutional Neural Network model called "PlantXViT" is proposed for plant disease identification. The proposed model has a lightweight structure with only 0.8 million trainable parameters, which makes it suitable for IoT-based smart agriculture services.
arXiv Detail & Related papers (2022-07-16T12:05:06Z)
Vision Transformers For Weeds and Crops Classification Of High Resolution UAV Images [3.1083892213758104]
Vision Transformer (ViT) models can achieve competitive or better results without applying any convolution operations. Our experiments show that with small set of labelled training data, ViT models perform better compared to state-of-the-art CNN-based models.
arXiv Detail & Related papers (2021-09-06T19:58:54Z)
Two-View Fine-grained Classification of Plant Species [66.75915278733197]
We propose a novel method based on a two-view leaf image representation and a hierarchical classification strategy for fine-grained recognition of plant species. A deep metric based on Siamese convolutional neural networks is used to reduce the dependence on a large number of training samples and make the method scalable to new plant species.
arXiv Detail & Related papers (2020-05-18T21:57:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.