Comparison between transformers and convolutional models for
fine-grained classification of insects
- URL: http://arxiv.org/abs/2307.11112v1
- Date: Thu, 20 Jul 2023 10:00:04 GMT
- Title: Comparison between transformers and convolutional models for
fine-grained classification of insects
- Authors: Rita Pucci, Vincent J. Kalkman, Dan Stowell
- Abstract summary: We consider the taxonomical class of Insecta.
The identification of insects is essential in biodiversity monitoring as they are one of the inhabitants at the base of many ecosystems.
We have billions of images that need to be automatically classified and deep neural network algorithms are one of the main techniques explored for fine-grained tasks.
- Score: 7.107353918348911
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fine-grained classification is challenging due to the difficulty of finding
discriminatory features. This problem is exacerbated when applied to
identifying species within the same taxonomical class. This is because species
are often sharing morphological characteristics that make them difficult to
differentiate. We consider the taxonomical class of Insecta. The identification
of insects is essential in biodiversity monitoring as they are one of the
inhabitants at the base of many ecosystems. Citizen science is doing brilliant
work of collecting images of insects in the wild giving the possibility to
experts to create improved distribution maps in all countries. We have billions
of images that need to be automatically classified and deep neural network
algorithms are one of the main techniques explored for fine-grained tasks. At
the SOTA, the field of deep learning algorithms is extremely fruitful, so how
to identify the algorithm to use? We focus on Odonata and Coleoptera orders,
and we propose an initial comparative study to analyse the two best-known layer
structures for computer vision: transformer and convolutional layers. We
compare the performance of T2TViT, a fully transformer-base, EfficientNet, a
fully convolutional-base, and ViTAE, a hybrid. We analyse the performance of
the three models in identical conditions evaluating the performance per
species, per morph together with sex, the inference time, and the overall
performance with unbalanced datasets of images from smartphones. Although we
observe high performances with all three families of models, our analysis shows
that the hybrid model outperforms the fully convolutional-base and fully
transformer-base models on accuracy performance and the fully transformer-base
model outperforms the others on inference speed and, these prove the
transformer to be robust to the shortage of samples and to be faster at
inference time.
Related papers
- Forgery-aware Adaptive Transformer for Generalizable Synthetic Image
Detection [106.39544368711427]
We study the problem of generalizable synthetic image detection, aiming to detect forgery images from diverse generative methods.
We present a novel forgery-aware adaptive transformer approach, namely FatFormer.
Our approach tuned on 4-class ProGAN data attains an average of 98% accuracy to unseen GANs, and surprisingly generalizes to unseen diffusion models with 95% accuracy.
arXiv Detail & Related papers (2023-12-27T17:36:32Z) - Affine-Consistent Transformer for Multi-Class Cell Nuclei Detection [76.11864242047074]
We propose a novel Affine-Consistent Transformer (AC-Former), which directly yields a sequence of nucleus positions.
We introduce an Adaptive Affine Transformer (AAT) module, which can automatically learn the key spatial transformations to warp original images for local network training.
Experimental results demonstrate that the proposed method significantly outperforms existing state-of-the-art algorithms on various benchmarks.
arXiv Detail & Related papers (2023-10-22T02:27:02Z) - Stacking Ensemble Learning in Deep Domain Adaptation for Ophthalmic
Image Classification [61.656149405657246]
Domain adaptation is effective in image classification tasks where obtaining sufficient label data is challenging.
We propose a novel method, named SELDA, for stacking ensemble learning via extending three domain adaptation methods.
The experimental results using Age-Related Eye Disease Study (AREDS) benchmark ophthalmic dataset demonstrate the effectiveness of the proposed model.
arXiv Detail & Related papers (2022-09-27T14:19:00Z) - Towards Fine-grained Image Classification with Generative Adversarial
Networks and Facial Landmark Detection [0.0]
We use GAN-based data augmentation to generate extra dataset instances.
We validated our work by evaluating the accuracy of fine-grained image classification on the recent Vision Transformer (ViT) Model.
arXiv Detail & Related papers (2021-08-28T06:32:42Z) - Exploring Vision Transformers for Fine-grained Classification [0.0]
We propose a multi-stage ViT framework for fine-grained image classification tasks, which localizes the informative image regions without requiring architectural changes.
We demonstrate the value of our approach by experimenting with four popular fine-grained benchmarks: CUB-200-2011, Stanford Cars, Stanford Dogs, and FGVC7 Plant Pathology.
arXiv Detail & Related papers (2021-06-19T23:57:31Z) - Vision Transformers are Robust Learners [65.91359312429147]
We study the robustness of the Vision Transformer (ViT) against common corruptions and perturbations, distribution shifts, and natural adversarial examples.
We present analyses that provide both quantitative and qualitative indications to explain why ViTs are indeed more robust learners.
arXiv Detail & Related papers (2021-05-17T02:39:22Z) - Two-View Fine-grained Classification of Plant Species [66.75915278733197]
We propose a novel method based on a two-view leaf image representation and a hierarchical classification strategy for fine-grained recognition of plant species.
A deep metric based on Siamese convolutional neural networks is used to reduce the dependence on a large number of training samples and make the method scalable to new plant species.
arXiv Detail & Related papers (2020-05-18T21:57:47Z) - Automatic image-based identification and biomass estimation of
invertebrates [70.08255822611812]
Time-consuming sorting and identification of taxa pose strong limitations on how many insect samples can be processed.
We propose to replace the standard manual approach of human expert-based sorting and identification with an automatic image-based technology.
We use state-of-the-art Resnet-50 and InceptionV3 CNNs for the classification task.
arXiv Detail & Related papers (2020-02-05T21:38:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.