Related papers: Direct Video-Based Spatiotemporal Deep Learning for Cattle Lameness Detection

Direct Video-Based Spatiotemporal Deep Learning for Cattle Lameness Detection

URL: http://arxiv.org/abs/2504.16404v4
Date: Thu, 18 Sep 2025 03:50:59 GMT
Title: Direct Video-Based Spatiotemporal Deep Learning for Cattle Lameness Detection
Authors: Md Fahimuzzman Sohan, Raid Alzubi, Hadeel Alzoubi, Eid Albalawi, A. H. Abdul Hafez,
Abstract summary: This study proposes a framework for automated cattle lameness detection using publicly available video data.<n>Two deep learning architectures were trained and evaluated.<n>The 3D CNN achieved a video-level classification accuracy of 90%, with a precision, recall, and 85% each, outperforming the ConvLSD2 model.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Cattle lameness is a prevalent health problem in livestock farming, often resulting from hoof injuries or infections, and severely impacts animal welfare and productivity. Early and accurate detection is critical for minimizing economic losses and ensuring proper treatment. This study proposes a spatiotemporal deep learning framework for automated cattle lameness detection using publicly available video data. We curate and publicly release a balanced set of 50 online video clips featuring 42 individual cattle, recorded from multiple viewpoints in both indoor and outdoor environments. The videos were categorized into lame and non-lame classes based on visual gait characteristics and metadata descriptions. After applying data augmentation techniques to enhance generalization, two deep learning architectures were trained and evaluated: 3D Convolutional Neural Networks (3D CNN) and Convolutional Long-Short-Term Memory (ConvLSTM2D). The 3D CNN achieved a video-level classification accuracy of 90%, with a precision, recall, and F1 score of 90.9% each, outperforming the ConvLSTM2D model, which achieved 85% accuracy. Unlike conventional approaches that rely on multistage pipelines involving object detection and pose estimation, this study demonstrates the effectiveness of a direct end-to-end video classification approach. Compared with the best end-to-end prior method (C3D-ConvLSTM, 90.3%), our model achieves comparable accuracy while eliminating pose estimation pre-processing.The results indicate that deep learning models can successfully extract and learn spatio-temporal features from various video sources, enabling scalable and efficient cattle lameness detection in real-world farm settings.

Related papers

Real-Time Human Detection for Aerial Captured Video Sequences via Deep Models [1.4656201740804355]
We propose automatic feature learning methods, which combine optical flow and three different deep models.<n>The models are trained and tested on the publicly available and highly challenging UCF-ARG aerial dataset.<n> Experimental results demonstrated that the proposed methods are successful for the human detection task.
arXiv Detail & Related papers (2026-01-01T17:00:04Z)
Cattle-CLIP: A Multimodal Framework for Cattle Behaviour Recognition [5.45546363077543]
Cattle-CLIP is a multimodal deep learning framework for cattle behaviour recognition.<n>It is adapted from the large-scale image-language model CLIP by adding a temporal integration module.<n>Experiments show that Cattle-CLIP achieves 96.1% overall accuracy across six behaviours in a supervised setting.
arXiv Detail & Related papers (2025-10-10T09:43:12Z)
An empirical study for the early detection of Mpox from skin lesion images using pretrained CNN models leveraging XAI technique [0.471858286267785]
Mpox is a zoonotic disease caused by the Mpox virus, which shares similarities with other skin conditions.<n>This study aims to evaluate the effectiveness of pre-trained CNN models for the early detection of monkeypox.<n>It also seeks to enhance model interpretability using Grad-CAM an XAI technique.
arXiv Detail & Related papers (2025-07-21T17:30:08Z)
Leveraging Pre-Trained Visual Models for AI-Generated Video Detection [54.88903878778194]
The field of video generation has advanced beyond DeepFakes, creating an urgent need for methods capable of detecting AI-generated videos with generic content.<n>We propose a novel approach that leverages pre-trained visual models to distinguish between real and generated videos.<n>Our method achieves high detection accuracy, above 90% on average, underscoring its effectiveness.
arXiv Detail & Related papers (2025-07-17T15:36:39Z)
Explainable AI-Driven Detection of Human Monkeypox Using Deep Learning and Vision Transformers: A Comprehensive Analysis [0.20482269513546453]
mpox is a zoonotic viral illness that poses a significant public health concern.<n>It is difficult to make an early clinical diagnosis because of how closely its symptoms match those of measles and chickenpox.<n>Medical imaging combined with deep learning (DL) techniques has shown promise in improving disease detection by analyzing affected skin areas.<n>Our study explore the feasibility to train deep learning and vision transformer-based models from scratch with publicly available skin lesion image dataset.
arXiv Detail & Related papers (2025-04-03T19:45:22Z)
Excretion Detection in Pigsties Using Convolutional and Transformerbased Deep Neural Networks [0.0]
Animal excretions in form of urine puddles and feces are a significant source of emissions in livestock farming.<n>Previous research approaches to determine the puddle area require manual detection of the puddle in the barn.<n>This work is the first to investigate the suitability of different deep learning models for the detection of excretions in pigsties.
arXiv Detail & Related papers (2024-11-29T21:00:08Z)
Brain Tumor Classification on MRI in Light of Molecular Markers [61.77272414423481]
Co-deletion of the 1p/19q gene is associated with clinical outcomes in low-grade gliomas.<n>This study aims to utilize a specially MRI-based convolutional neural network for brain cancer detection.
arXiv Detail & Related papers (2024-09-29T07:04:26Z)
Self-supervised Feature Adaptation for 3D Industrial Anomaly Detection [59.41026558455904]
We focus on multi-modal anomaly detection. Specifically, we investigate early multi-modal approaches that attempted to utilize models pre-trained on large-scale visual datasets. We propose a Local-to-global Self-supervised Feature Adaptation (LSFA) method to finetune the adaptors and learn task-oriented representation toward anomaly detection.
arXiv Detail & Related papers (2024-01-06T07:30:41Z)
Learning Human Action Recognition Representations Without Real Humans [66.61527869763819]
We present a benchmark that leverages real-world videos with humans removed and synthetic data containing virtual humans to pre-train a model. We then evaluate the transferability of the representation learned on this data to a diverse set of downstream action recognition benchmarks. Our approach outperforms previous baselines by up to 5%.
arXiv Detail & Related papers (2023-11-10T18:38:14Z)
The effect of data augmentation and 3D-CNN depth on Alzheimer's Disease detection [51.697248252191265]
This work summarizes and strictly observes best practices regarding data handling, experimental design, and model evaluation. We focus on Alzheimer's Disease (AD) detection, which serves as a paradigmatic example of challenging problem in healthcare. Within this framework, we train predictive 15 models, considering three different data augmentation strategies and five distinct 3D CNN architectures.
arXiv Detail & Related papers (2023-09-13T10:40:41Z)
CVB: A Video Dataset of Cattle Visual Behaviors [13.233877352490923]
Existing datasets for cattle behavior recognition are mostly small, lack well-defined labels, or are collected in unrealistic controlled environments. We introduce a new dataset, called Cattle Visual Behaviors (CVB), that consists of 502 video clips, each fifteen seconds long, captured in natural lighting conditions, and annotated with eleven visually perceptible behaviors of grazing cattle.
arXiv Detail & Related papers (2023-05-26T00:44:11Z)
Human activity recognition using deep learning approaches and single frame cnn and convolutional lstm [0.0]
We explore two deep learning-based approaches, namely single frame Convolutional Neural Networks (CNNs) and convolutional Long Short-Term Memory to recognise human actions from videos. The two models were trained and evaluated on a benchmark action recognition dataset, UCF50, and another dataset that was created for the experimentation. Though both models exhibit good accuracies, the single frame CNN model outperforms the Convolutional LSTM model by having an accuracy of 99.8% with the UCF50 dataset.
arXiv Detail & Related papers (2023-04-18T01:33:29Z)
TempNet: Temporal Attention Towards the Detection of Animal Behaviour in Videos [63.85815474157357]
We propose an efficient computer vision- and deep learning-based method for the detection of biological behaviours in videos. TempNet uses an encoder bridge and residual blocks to maintain model performance with a two-staged, spatial, then temporal, encoder. We demonstrate its application to the detection of sablefish (Anoplopoma fimbria) startle events.
arXiv Detail & Related papers (2022-11-17T23:55:12Z)
Intelligent 3D Network Protocol for Multimedia Data Classification using Deep Learning [0.0]
We implement Hybrid Deep Learning Architecture that combines STIP and 3D CNN features to enhance the performance of 3D videos effectively. The results are compared with state-of-the-art frameworks from literature for action recognition on UCF101 with an accuracy of 95%.
arXiv Detail & Related papers (2022-07-23T12:24:52Z)
Revisiting Classifier: Transferring Vision-Language Models for Video Recognition [102.93524173258487]
Transferring knowledge from task-agnostic pre-trained deep models for downstream tasks is an important topic in computer vision research. In this study, we focus on transferring knowledge for video classification tasks. We utilize the well-pretrained language model to generate good semantic target for efficient transferring learning.
arXiv Detail & Related papers (2022-07-04T10:00:47Z)
CONVIQT: Contrastive Video Quality Estimator [63.749184706461826]
Perceptual video quality assessment (VQA) is an integral component of many streaming and video sharing platforms. Here we consider the problem of learning perceptually relevant video quality representations in a self-supervised manner. Our results indicate that compelling representations with perceptual bearing can be obtained using self-supervised learning.
arXiv Detail & Related papers (2022-06-29T15:22:01Z)
T-LEAP: occlusion-robust pose estimation of walking cows using temporal information [0.0]
Lameness, a prevalent health disorder in dairy cows, is commonly detected by analyzing the gait of cows. A cow's gait can be tracked in videos using pose estimation models because models learn to automatically localize anatomical landmarks in images and videos. Most animal pose estimation models are static, that is, videos are processed frame by frame and do not use any temporal information.
arXiv Detail & Related papers (2021-04-16T10:50:56Z)
A Deep Learning Study on Osteosarcoma Detection from Histological Images [6.341765152919201]
The most common type of primary malignant bone tumor is osteosarcoma. CNNs can significantly decrease surgeon's workload and make a better prognosis of patient conditions. CNNs need to be trained on a large amount of data in order to achieve a more trustworthy performance.
arXiv Detail & Related papers (2020-11-02T18:16:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.