Related papers: Evaluating Deep Learning Models for African Wildlife Image Classification: From DenseNet to Vision Transformers

Evaluating Deep Learning Models for African Wildlife Image Classification: From DenseNet to Vision Transformers

URL: http://arxiv.org/abs/2507.21364v1
Date: Mon, 28 Jul 2025 22:18:13 GMT
Title: Evaluating Deep Learning Models for African Wildlife Image Classification: From DenseNet to Vision Transformers
Authors: Lukman Jibril Aliyu, Umar Sani Muhammad, Bilqisu Ismail, Nasiru Muhammad, Almustapha A Wakili, Seid Muhie Yimam, Shamsuddeen Hassan Muhammad, Mustapha Abdullahi,
Abstract summary: Wildlife populations in Africa face severe threats, with vertebrate numbers declining by over 65% in the past five decades.<n>In response, image classification using deep learning has emerged as a promising tool for biodiversity monitoring and conservation.<n>This paper presents a comparative study of deep learning models for automatically classifying African wildlife images.
Score: 3.4801331938495705
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Wildlife populations in Africa face severe threats, with vertebrate numbers declining by over 65% in the past five decades. In response, image classification using deep learning has emerged as a promising tool for biodiversity monitoring and conservation. This paper presents a comparative study of deep learning models for automatically classifying African wildlife images, focusing on transfer learning with frozen feature extractors. Using a public dataset of four species: buffalo, elephant, rhinoceros, and zebra; we evaluate the performance of DenseNet-201, ResNet-152, EfficientNet-B4, and Vision Transformer ViT-H/14. DenseNet-201 achieved the best performance among convolutional networks (67% accuracy), while ViT-H/14 achieved the highest overall accuracy (99%), but with significantly higher computational cost, raising deployment concerns. Our experiments highlight the trade-offs between accuracy, resource requirements, and deployability. The best-performing CNN (DenseNet-201) was integrated into a Hugging Face Gradio Space for real-time field use, demonstrating the feasibility of deploying lightweight models in conservation settings. This work contributes to African-grounded AI research by offering practical insights into model selection, dataset preparation, and responsible deployment of deep learning tools for wildlife conservation.

Related papers

Enhanced Infield Agriculture with Interpretable Machine Learning Approaches for Crop Classification [0.49110747024865004]
This research evaluates four different approaches for crop classification, namely traditional ML with handcrafted feature extraction methods like SIFT, ORB, and Color Histogram; Custom Designed CNN and established DL architecture like AlexNet; transfer learning on five models pre-trained using ImageNet. Xception outperformed all of them in terms of generalization, achieving 98% accuracy on the test data, with a model size of 80.03 MB and a prediction time of 0.0633 seconds.
arXiv Detail & Related papers (2024-08-22T14:20:34Z)
Transfer Learning for Wildlife Classification: Evaluating YOLOv8 against DenseNet, ResNet, and VGGNet on a Custom Dataset [0.0]
The study utilizes transfer learning to fine-tune pre-trained models on the dataset. YOLOv8 outperforms other models, achieving a training accuracy of 97.39% and a validation F1-score of 96.50%.
arXiv Detail & Related papers (2024-07-10T15:03:00Z)
Comparing Male Nyala and Male Kudu Classification using Transfer Learning with ResNet-50 and VGG-16 [0.0]
This paper investigates the efficiency of pre-trained models, specifically the VGG-16 and ResNet-50 model, in identifying a male Kudu and a male Nyala in their natural habitats. The experimental results achieved an accuracy of 93.2% and 97.7% for the VGG-16 and ResNet-50 models, respectively, before fine-tuning and 97.7% for both models after fine-tuning.
arXiv Detail & Related papers (2023-11-10T10:43:46Z)
Learning from History: Task-agnostic Model Contrastive Learning for Image Restoration [79.04007257606862]
This paper introduces an innovative method termed 'learning from history', which dynamically generates negative samples from the target model itself. Our approach, named Model Contrastive Learning for Image Restoration (MCLIR), rejuvenates latency models as negative models, making it compatible with diverse image restoration tasks.
arXiv Detail & Related papers (2023-09-12T07:50:54Z)
Revisiting Classifier: Transferring Vision-Language Models for Video Recognition [102.93524173258487]
Transferring knowledge from task-agnostic pre-trained deep models for downstream tasks is an important topic in computer vision research. In this study, we focus on transferring knowledge for video classification tasks. We utilize the well-pretrained language model to generate good semantic target for efficient transferring learning.
arXiv Detail & Related papers (2022-07-04T10:00:47Z)
Core Risk Minimization using Salient ImageNet [53.616101711801484]
We introduce the Salient Imagenet dataset with more than 1 million soft masks localizing core and spurious features for all 1000 Imagenet classes. Using this dataset, we first evaluate the reliance of several Imagenet pretrained models (42 total) on spurious features. Next, we introduce a new learning paradigm called Core Risk Minimization (CoRM) whose objective ensures that the model predicts a class using its core features.
arXiv Detail & Related papers (2022-03-28T01:53:34Z)
Zoo-Tuning: Adaptive Transfer from a Zoo of Models [82.9120546160422]
Zoo-Tuning learns to adaptively transfer the parameters of pretrained models to the target task. We evaluate our approach on a variety of tasks, including reinforcement learning, image classification, and facial landmark detection.
arXiv Detail & Related papers (2021-06-29T14:09:45Z)
Efficient Self-supervised Vision Transformers for Representation Learning [86.57557009109411]
We show that multi-stage architectures with sparse self-attentions can significantly reduce modeling complexity. We propose a new pre-training task of region matching which allows the model to capture fine-grained region dependencies. Our results show that combining the two techniques, EsViT achieves 81.3% top-1 on the ImageNet linear probe evaluation.
arXiv Detail & Related papers (2021-06-17T19:57:33Z)
Deep learning with self-supervision and uncertainty regularization to count fish in underwater images [28.261323753321328]
Effective conservation actions require effective population monitoring. Monitoring populations through image sampling has made data collection cheaper, wide-reaching and less intrusive. Counting animals from such data is challenging, particularly when densely packed in noisy images. Deep learning is the state-of-the-art method for many computer vision tasks, but it has yet to be properly explored to count animals.
arXiv Detail & Related papers (2021-04-30T13:02:19Z)
How many images do I need? Understanding how sample size per class affects deep learning model performance metrics for balanced designs in autonomous wildlife monitoring [0.0]
We explore in depth the issues of deep learning model performance for progressively increasing per class (species) sample sizes. We provide ecologists with an approximation formula to estimate how many images per animal species they need for certain accuracy level a priori.
arXiv Detail & Related papers (2020-10-16T06:28:35Z)
Diversity inducing Information Bottleneck in Model Ensembles [73.80615604822435]
In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction. We explicitly optimize a diversity inducing adversarial loss for learning latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data. Compared to the most competitive baselines, we show significant improvements in classification accuracy, under a shift in the data distribution.
arXiv Detail & Related papers (2020-03-10T03:10:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.