Related papers: Evaluation of deep learning architectures for wildlife object detection: A comparative study of ResNet and Inception

Evaluation of deep learning architectures for wildlife object detection: A comparative study of ResNet and Inception

URL: http://arxiv.org/abs/2512.15480v1
Date: Wed, 17 Dec 2025 14:30:47 GMT
Title: Evaluation of deep learning architectures for wildlife object detection: A comparative study of ResNet and Inception
Authors: Malach Obisa Amonga, Benard Osero, Edna Too,
Abstract summary: This study investigates the effectiveness of two individual deep learning architectures ResNet-101 and Inception v3 for wildlife object detection.<n>The models were trained and evaluated on a wildlife image dataset using a standardized preprocessing approach.<n>The ResNet-101 model achieved a classification accuracy of 94% and a mean Average Precision (mAP) of 0.91, showing strong performance in extracting deep hierarchical features.
Score: 0.0
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Wildlife object detection plays a vital role in biodiversity conservation, ecological monitoring, and habitat protection. However, this task is often challenged by environmental variability, visual similarities among species, and intra-class diversity. This study investigates the effectiveness of two individual deep learning architectures ResNet-101 and Inception v3 for wildlife object detection under such complex conditions. The models were trained and evaluated on a wildlife image dataset using a standardized preprocessing approach, which included resizing images to a maximum dimension of 800 pixels, converting them to RGB format, and transforming them into PyTorch tensors. A ratio of 70:30 training and validation split was used for model development. The ResNet-101 model achieved a classification accuracy of 94% and a mean Average Precision (mAP) of 0.91, showing strong performance in extracting deep hierarchical features. The Inception v3 model performed slightly better, attaining a classification accuracy of 95% and a mAP of 0.92, attributed to its efficient multi-scale feature extraction through parallel convolutions. Despite the strong results, both models exhibited challenges when detecting species with similar visual characteristics or those captured under poor lighting and occlusion. Nonetheless, the findings confirm that both ResNet-101 and Inception v3 are effective models for wildlife object detection tasks and provide a reliable foundation for conservation-focused computer vision applications.

Related papers

Pose Matters: Evaluating Vision Transformers and CNNs for Human Action Recognition on Small COCO Subsets [0.0]
This study explores human recognition using a three-class subset of the COCO image corpus.<n>The binary Vision Transformer (ViT) achieved 90% mean test accuracy.
arXiv Detail & Related papers (2025-06-13T11:16:50Z)
Multimodal Feature-Driven Deep Learning for the Prediction of Duck Body Dimensions and Weight [12.125067563652257]
This study introduces an innovative deep learning-based model leveraging multimodal data-2D RGB images from different views, depth images, and 3D point clouds.<n>A dataset of 1,023 Linwu ducks, comprising over 5,000 samples with diverse postures and conditions, was collected to support model training.<n>The model achieved a mean absolute percentage error (MAPE) of 6.33% and an R2 of 0.953 across eight morphometric parameters, demonstrating strong predictive capability.
arXiv Detail & Related papers (2025-03-18T08:09:19Z)
Fruit Fly Classification (Diptera: Tephritidae) in Images, Applying Transfer Learning [8.700842317740943]
This study develops a transfer learning model for the automated classification of two species of fruit flies.<n>Inception-v3 is an effective and replicable approach for classifying Anastrepha fraterculus and Ceratitis capitata.
arXiv Detail & Related papers (2025-02-02T22:16:04Z)
Deep Learning for Leopard Individual Identification: An Adaptive Angular Margin Approach [0.0]
This paper introduces a deep learning framework to distinguish between individual leopards based on their unique spot patterns. I propose a preprocessing pipeline that combines RGB channels with an edge detection channel to underscore the critical features learned by the model.
arXiv Detail & Related papers (2024-11-04T10:38:33Z)
Open-Set Deepfake Detection: A Parameter-Efficient Adaptation Method with Forgery Style Mixture [81.93945602120453]
We introduce an approach that is both general and parameter-efficient for face forgery detection.<n>We design a forgery-style mixture formulation that augments the diversity of forgery source domains.<n>We show that the designed model achieves state-of-the-art generalizability with significantly reduced trainable parameters.
arXiv Detail & Related papers (2024-08-23T01:53:36Z)
PoseBench: Benchmarking the Robustness of Pose Estimation Models under Corruptions [57.871692507044344]
Pose estimation aims to accurately identify anatomical keypoints in humans and animals using monocular images. Current models are typically trained and tested on clean data, potentially overlooking the corruption during real-world deployment. We introduce PoseBench, a benchmark designed to evaluate the robustness of pose estimation models against real-world corruption.
arXiv Detail & Related papers (2024-06-20T14:40:17Z)
OOD-CV-v2: An extended Benchmark for Robustness to Out-of-Distribution Shifts of Individual Nuisances in Natural Images [59.51657161097337]
OOD-CV-v2 is a benchmark dataset that includes out-of-distribution examples of 10 object categories in terms of pose, shape, texture, context and the weather conditions. In addition to this novel dataset, we contribute extensive experiments using popular baseline methods.
arXiv Detail & Related papers (2023-04-17T20:39:25Z)
Stacking Ensemble Learning in Deep Domain Adaptation for Ophthalmic Image Classification [61.656149405657246]
Domain adaptation is effective in image classification tasks where obtaining sufficient label data is challenging. We propose a novel method, named SELDA, for stacking ensemble learning via extending three domain adaptation methods. The experimental results using Age-Related Eye Disease Study (AREDS) benchmark ophthalmic dataset demonstrate the effectiveness of the proposed model.
arXiv Detail & Related papers (2022-09-27T14:19:00Z)
Modeling Object Dissimilarity for Deep Saliency Prediction [86.14710352178967]
We introduce a detection-guided saliency prediction network that explicitly models the differences between multiple objects. Our approach is general, allowing us to fuse our object dissimilarities with features extracted by any deep saliency prediction network.
arXiv Detail & Related papers (2021-04-08T16:10:37Z)
How many images do I need? Understanding how sample size per class affects deep learning model performance metrics for balanced designs in autonomous wildlife monitoring [0.0]
We explore in depth the issues of deep learning model performance for progressively increasing per class (species) sample sizes. We provide ecologists with an approximation formula to estimate how many images per animal species they need for certain accuracy level a priori.
arXiv Detail & Related papers (2020-10-16T06:28:35Z)
From Sound Representation to Model Robustness [82.21746840893658]
We investigate the impact of different standard environmental sound representations (spectrograms) on the recognition performance and adversarial attack robustness of a victim residual convolutional neural network. Averaged over various experiments on three environmental sound datasets, we found the ResNet-18 model outperforms other deep learning architectures.
arXiv Detail & Related papers (2020-07-27T17:30:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.