Related papers: Cattle-CLIP: A Multimodal Framework for Cattle Behaviour Recognition

Cattle-CLIP: A Multimodal Framework for Cattle Behaviour Recognition

URL: http://arxiv.org/abs/2510.09203v1
Date: Fri, 10 Oct 2025 09:43:12 GMT
Title: Cattle-CLIP: A Multimodal Framework for Cattle Behaviour Recognition
Authors: Huimin Liu, Jing Gao, Daria Baran, AxelX Montout, Neill W Campbell, Andrew W Dowsey,
Abstract summary: Cattle-CLIP is a multimodal deep learning framework for cattle behaviour recognition.<n>It is adapted from the large-scale image-language model CLIP by adding a temporal integration module.<n>Experiments show that Cattle-CLIP achieves 96.1% overall accuracy across six behaviours in a supervised setting.
Score: 5.45546363077543
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Cattle behaviour is a crucial indicator of an individual animal health, productivity and overall well-being. Video-based monitoring, combined with deep learning techniques, has become a mainstream approach in animal biometrics, and it can offer high accuracy in some behaviour recognition tasks. We present Cattle-CLIP, a multimodal deep learning framework for cattle behaviour recognition, using semantic cues to improve the performance of video-based visual feature recognition. It is adapted from the large-scale image-language model CLIP by adding a temporal integration module. To address the domain gap between web data used for the pre-trained model and real-world cattle surveillance footage, we introduce tailored data augmentation strategies and specialised text prompts. Cattle-CLIP is evaluated under both fully-supervised and few-shot learning scenarios, with a particular focus on data-scarce behaviour recognition - an important yet under-explored goal in livestock monitoring. To evaluate the proposed method, we release the CattleBehaviours6 dataset, which comprises six types of indoor behaviours: feeding, drinking, standing-self-grooming, standing-ruminating, lying-self-grooming and lying-ruminating. The dataset consists of 1905 clips collected from our John Oldacre Centre dairy farm research platform housing 200 Holstein-Friesian cows. Experiments show that Cattle-CLIP achieves 96.1% overall accuracy across six behaviours in a supervised setting, with nearly 100% recall for feeding, drinking and standing-ruminating behaviours, and demonstrates robust generalisation with limited data in few-shot scenarios, highlighting the potential of multimodal learning in agricultural and animal behaviour analysis.

Related papers

Automated Re-Identification of Holstein-Friesian Cattle in Dense Crowds [2.3843187053931456]
We propose a new detect-segment-identify pipeline that leverages the Open-Vocabulary Weight-free Localisation and the Segment Anything models.<n>Our methodology overcomes detection breakdown in dense animal groupings, resulting in a 98.93% accuracy.<n>We show that unsupervised contrastive learning can build on this to yield 94.82% Re-ID accuracy on our test data.
arXiv Detail & Related papers (2026-02-17T19:25:50Z)
PriVi: Towards A General-Purpose Video Model For Primate Behavior In The Wild [50.656578456979496]
We introduce PriVi, a large-scale primate-centric video pretraining dataset.<n>We pretrain V-JEPA, a large-scale video model, on PriVi to learn primate-specific representations.<n>Results demonstrate that primate-centric pretraining substantially improves data efficiency and generalization.
arXiv Detail & Related papers (2025-11-12T19:27:40Z)
A Computer Vision Pipeline for Individual-Level Behavior Analysis: Benchmarking on the Edinburgh Pig Dataset [0.46297934208241753]
Animal behavior analysis plays a crucial role in understanding animal welfare, health status, and productivity in agricultural settings.<n>We present a modular pipeline that leverages open-sourced state-of-the-art computer vision techniques to automate animal behavior analysis in a group housing environment.<n>Our approach combines state-of-the-art models for zero-shot object detection, motion-aware tracking and segmentation, and advanced feature extraction using vision transformers for robust behavior recognition.
arXiv Detail & Related papers (2025-09-15T15:31:12Z)
Consistent multi-animal pose estimation in cattle using dynamic Kalman filter based tracking [0.0]
KeySORT is an adaptive Kalman filter to construct tracklets in a bounding-box free manner, significantly improving the temporal consistency of detected keypoints.<n>Our test results indicate our algorithm is able to detect up to 80% of the ground truth keypoints with high accuracy.
arXiv Detail & Related papers (2025-03-13T15:15:54Z)
Holstein-Friesian Re-Identification using Multiple Cameras and Self-Supervision on a Working Farm [2.9391768712283772]
We present MultiCamCows2024, a farm-scale image dataset filmed across multiple cameras for the biometric identification of individual Holstein-Friesian cattle.<n>The dataset comprises 101,329 images of 90 cows, plus underlying original CCTV footage.<n>We report a performance above 96% single image identification accuracy from the dataset and demonstrate that combining data from multiple cameras during learning enhances self-supervised identification.
arXiv Detail & Related papers (2024-10-16T15:58:47Z)
AnimalFormer: Multimodal Vision Framework for Behavior-based Precision Livestock Farming [0.0]
We introduce a multimodal vision framework for precision livestock farming. We harness the power of GroundingDINO, HQSAM, and ViTPose models. This suite enables comprehensive behavioral analytics from video data without invasive animal tagging.
arXiv Detail & Related papers (2024-06-14T04:42:44Z)
Distillation-guided Representation Learning for Unconstrained Gait Recognition [50.0533243584942]
We propose a framework, termed GAit DEtection and Recognition (GADER), for human authentication in challenging outdoor scenarios. GADER builds discriminative features through a novel gait recognition method, where only frames containing gait information are used. We evaluate our method on multiple State-of-The-Arts(SoTA) gait baselines and demonstrate consistent improvements on indoor and outdoor datasets.
arXiv Detail & Related papers (2023-07-27T01:53:57Z)
Occlusion-Resistant Instance Segmentation of Piglets in Farrowing Pens Using Center Clustering Network [48.42863035798351]
We propose a novel Center Clustering Network for instance segmentation, dubbed as CClusnet-Inseg. CClusnet-Inseg uses each pixel to predict object centers and trace these centers to form masks based on clustering results. In all, 4,600 images were extracted from six videos collected from six farrowing pens to train and validate our method.
arXiv Detail & Related papers (2022-06-04T08:43:30Z)
UniVIP: A Unified Framework for Self-Supervised Visual Pre-training [50.87603616476038]
We propose a novel self-supervised framework to learn versatile visual representations on either single-centric-object or non-iconic dataset. Massive experiments show that UniVIP pre-trained on non-iconic COCO achieves state-of-the-art transfer performance. Our method can also exploit single-centric-object dataset such as ImageNet and outperforms BYOL by 2.5% with the same pre-training epochs in linear probing.
arXiv Detail & Related papers (2022-03-14T10:04:04Z)
Persistent Animal Identification Leveraging Non-Visual Markers [71.14999745312626]
We aim to locate and provide a unique identifier for each mouse in a cluttered home-cage environment through time. This is a very challenging problem due to (i) the lack of distinguishing visual features for each mouse, and (ii) the close confines of the scene with constant occlusion. Our approach achieves 77% accuracy on this animal identification problem, and is able to reject spurious detections when the animals are hidden.
arXiv Detail & Related papers (2021-12-13T17:11:32Z)
TraND: Transferable Neighborhood Discovery for Unsupervised Cross-domain Gait Recognition [77.77786072373942]
This paper proposes a Transferable Neighborhood Discovery (TraND) framework to bridge the domain gap for unsupervised cross-domain gait recognition. We design an end-to-end trainable approach to automatically discover the confident neighborhoods of unlabeled samples in the latent space. Our method achieves state-of-the-art results on two public datasets, i.e., CASIA-B and OU-LP.
arXiv Detail & Related papers (2021-02-09T03:07:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.