Vision Transformers For Weeds and Crops Classification Of High
Resolution UAV Images
- URL: http://arxiv.org/abs/2109.02716v1
- Date: Mon, 6 Sep 2021 19:58:54 GMT
- Title: Vision Transformers For Weeds and Crops Classification Of High
Resolution UAV Images
- Authors: Reenul Reedha, Eric Dericquebourg, Raphael Canals, Adel Hafiane
- Abstract summary: Vision Transformer (ViT) models can achieve competitive or better results without applying any convolution operations.
Our experiments show that with small set of labelled training data, ViT models perform better compared to state-of-the-art CNN-based models.
- Score: 3.1083892213758104
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Crop and weed monitoring is an important challenge for agriculture and food
production nowadays. Thanks to recent advances in data acquisition and
computation technologies, agriculture is evolving to a more smart and precision
farming to meet with the high yield and high quality crop production.
Classification and recognition in Unmanned Aerial Vehicles (UAV) images are
important phases for crop monitoring. Advances in deep learning models relying
on Convolutional Neural Network (CNN) have achieved high performances in image
classification in the agricultural domain. Despite the success of this
architecture, CNN still faces many challenges such as high computation cost,
the need of large labelled datasets, ... Natural language processing's
transformer architecture can be an alternative approach to deal with CNN's
limitations. Making use of the self-attention paradigm, Vision Transformer
(ViT) models can achieve competitive or better results without applying any
convolution operations. In this paper, we adopt the self-attention mechanism
via the ViT models for plant classification of weeds and crops: red beet,
off-type beet (green leaves), parsley and spinach. Our experiments show that
with small set of labelled training data, ViT models perform better compared to
state-of-the-art CNN-based models EfficientNet and ResNet, with a top accuracy
of 99.8\% achieved by the ViT model.
Related papers
- GenFormer -- Generated Images are All You Need to Improve Robustness of Transformers on Small Datasets [11.343905946690352]
We propose GenFormer, a data augmentation strategy utilizing generated images to improve transformer accuracy and robustness on small-scale image classification tasks.
In our comprehensive evaluation we propose Tiny ImageNetV2, -R, and -A as new test set variants of Tiny ImageNet.
We prove the effectiveness of our approach under challenging conditions with limited training data, demonstrating significant improvements in both accuracy and robustness.
arXiv Detail & Related papers (2024-08-26T09:26:08Z) - Enhanced Infield Agriculture with Interpretable Machine Learning Approaches for Crop Classification [0.49110747024865004]
This research evaluates four different approaches for crop classification, namely traditional ML with handcrafted feature extraction methods like SIFT, ORB, and Color Histogram; Custom Designed CNN and established DL architecture like AlexNet; transfer learning on five models pre-trained using ImageNet.
Xception outperformed all of them in terms of generalization, achieving 98% accuracy on the test data, with a model size of 80.03 MB and a prediction time of 0.0633 seconds.
arXiv Detail & Related papers (2024-08-22T14:20:34Z) - Optimizing Vision Transformers with Data-Free Knowledge Transfer [8.323741354066474]
Vision transformers (ViTs) have excelled in various computer vision tasks due to their superior ability to capture long-distance dependencies.
We propose compressing large ViT models using Knowledge Distillation (KD), which is implemented data-free to circumvent limitations related to data availability.
arXiv Detail & Related papers (2024-08-12T07:03:35Z) - Combined CNN and ViT features off-the-shelf: Another astounding baseline for recognition [49.14350399025926]
We apply pre-trained architectures, originally developed for the ImageNet Large Scale Visual Recognition Challenge, for periocular recognition.
Middle-layer features from CNNs and ViTs are a suitable way to recognize individuals based on periocular images.
arXiv Detail & Related papers (2024-07-28T11:52:36Z) - Generating Diverse Agricultural Data for Vision-Based Farming Applications [74.79409721178489]
This model is capable of simulating distinct growth stages of plants, diverse soil conditions, and randomized field arrangements under varying lighting conditions.
Our dataset includes 12,000 images with semantic labels, offering a comprehensive resource for computer vision tasks in precision agriculture.
arXiv Detail & Related papers (2024-03-27T08:42:47Z) - SugarViT -- Multi-objective Regression of UAV Images with Vision
Transformers and Deep Label Distribution Learning Demonstrated on Disease
Severity Prediction in Sugar Beet [3.2925222641796554]
This work will introduce a machine learning framework for automatized large-scale plant-specific trait annotation.
We develop an efficient Vision Transformer based model for disease severity scoring called SugarViT.
Although the model is evaluated on this special use case, it is held as generic as possible to also be applicable to various image-based classification and regression tasks.
arXiv Detail & Related papers (2023-11-06T13:01:17Z) - Advancing Plain Vision Transformer Towards Remote Sensing Foundation
Model [97.9548609175831]
We resort to plain vision transformers with about 100 million parameters and make the first attempt to propose large vision models customized for remote sensing tasks.
Specifically, to handle the large image size and objects of various orientations in RS images, we propose a new rotated varied-size window attention.
Experiments on detection tasks demonstrate the superiority of our model over all state-of-the-art models, achieving 81.16% mAP on the DOTA-V1.0 dataset.
arXiv Detail & Related papers (2022-08-08T09:08:40Z) - Generative Adversarial Networks for Image Augmentation in Agriculture: A
Systematic Review [5.639656362091594]
generative adversarial network (GAN) invented in 2014 in the computer vision community, provides suite of novel approaches that can learn good data representations.
This paper presents an overview of the evolution of GAN architectures followed by a systematic review of their application to agriculture.
arXiv Detail & Related papers (2022-04-10T15:33:05Z) - Improving Vision Transformers by Revisiting High-frequency Components [106.7140968644414]
We show that Vision Transformer (ViT) models are less effective in capturing the high-frequency components of images than CNN models.
To compensate, we propose HAT, which directly augments high-frequency components of images via adversarial training.
We show that HAT can consistently boost the performance of various ViT models.
arXiv Detail & Related papers (2022-04-03T05:16:51Z) - ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for
Image Recognition and Beyond [76.35955924137986]
We propose a Vision Transformer Advanced by Exploring intrinsic IB from convolutions, i.e., ViTAE.
ViTAE has several spatial pyramid reduction modules to downsample and embed the input image into tokens with rich multi-scale context.
We obtain the state-of-the-art classification performance, i.e., 88.5% Top-1 classification accuracy on ImageNet validation set and the best 91.2% Top-1 accuracy on ImageNet real validation set.
arXiv Detail & Related papers (2022-02-21T10:40:05Z) - Agriculture-Vision: A Large Aerial Image Database for Agricultural
Pattern Analysis [110.30849704592592]
We present Agriculture-Vision: a large-scale aerial farmland image dataset for semantic segmentation of agricultural patterns.
Each image consists of RGB and Near-infrared (NIR) channels with resolution as high as 10 cm per pixel.
We annotate nine types of field anomaly patterns that are most important to farmers.
arXiv Detail & Related papers (2020-01-05T20:19:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.