Benchmarking pig detection and tracking under diverse and challenging conditions
- URL: http://arxiv.org/abs/2507.16639v1
- Date: Tue, 22 Jul 2025 14:36:51 GMT
- Title: Benchmarking pig detection and tracking under diverse and challenging conditions
- Authors: Jonathan Henrich, Christian Post, Maximilian Zilke, Parth Shiroya, Emma Chanut, Amir Mollazadeh Yamchi, Ramin Yahyapour, Thomas Kneib, Imke Traulsen,
- Abstract summary: We curated two datasets: PigDetect for object detection and PigTrack for multi-object tracking.<n>For object detection, we show that challenging training images improve detection beyond what is achievable with randomly sampled images alone.<n>For multi-object tracking, we observed that SORT-based methods achieve superior detection performance compared to end-to-end trainable models.
- Score: 1.865175170209582
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: To ensure animal welfare and effective management in pig farming, monitoring individual behavior is a crucial prerequisite. While monitoring tasks have traditionally been carried out manually, advances in machine learning have made it possible to collect individualized information in an increasingly automated way. Central to these methods is the localization of animals across space (object detection) and time (multi-object tracking). Despite extensive research of these two tasks in pig farming, a systematic benchmarking study has not yet been conducted. In this work, we address this gap by curating two datasets: PigDetect for object detection and PigTrack for multi-object tracking. The datasets are based on diverse image and video material from realistic barn conditions, and include challenging scenarios such as occlusions or bad visibility. For object detection, we show that challenging training images improve detection performance beyond what is achievable with randomly sampled images alone. Comparing different approaches, we found that state-of-the-art models offer substantial improvements in detection quality over real-time alternatives. For multi-object tracking, we observed that SORT-based methods achieve superior detection performance compared to end-to-end trainable models. However, end-to-end models show better association performance, suggesting they could become strong alternatives in the future. We also investigate characteristic failure cases of end-to-end models, providing guidance for future improvements. The detection and tracking models trained on our datasets perform well in unseen pens, suggesting good generalization capabilities. This highlights the importance of high-quality training data. The datasets and research code are made publicly available to facilitate reproducibility, re-use and further development.
Related papers
- Learning Using Privileged Information for Litter Detection [0.6390468088226494]
This study presents a novel approach that combines privileged information with deep learning object detection.<n>We evaluate our method across five widely used object detection models.<n>Our results suggest that this methodology offers a practical solution for litter detection.
arXiv Detail & Related papers (2025-08-06T06:46:14Z) - A model-agnostic active learning approach for animal detection from camera traps [6.521571185874872]
We propose a model-agnostic active learning approach for detection of animals captured by camera traps.<n>Our approach integrates uncertainty and diversity quantities of samples at both the object-based and image-based levels into the active learning sample selection process.
arXiv Detail & Related papers (2025-07-09T04:36:59Z) - Consistent multi-animal pose estimation in cattle using dynamic Kalman filter based tracking [0.0]
KeySORT is an adaptive Kalman filter to construct tracklets in a bounding-box free manner, significantly improving the temporal consistency of detected keypoints.<n>Our test results indicate our algorithm is able to detect up to 80% of the ground truth keypoints with high accuracy.
arXiv Detail & Related papers (2025-03-13T15:15:54Z) - Understanding and Improving Training-Free AI-Generated Image Detections with Vision Foundation Models [68.90917438865078]
Deepfake techniques for facial synthesis and editing pose serious risks for generative models.<n>In this paper, we investigate how detection performance varies across model backbones, types, and datasets.<n>We introduce Contrastive Blur, which enhances performance on facial images, and MINDER, which addresses noise type bias, balancing performance across domains.
arXiv Detail & Related papers (2024-11-28T13:04:45Z) - A Model Generalization Study in Localizing Indoor Cows with COw LOcalization (COLO) dataset [0.0]
This study investigates the generalization capabilities of YOLOv8 and YOLOv9 models for cow detection in indoor free-stall barn settings.
We explore three key hypotheses: (1) Model generalization is equally influenced by changes in lighting conditions and camera angles; (2) Higher model complexity guarantees better generalization performance; (3) Fine-tuning with custom initial weights trained on relevant tasks always brings advantages to detection tasks.
arXiv Detail & Related papers (2024-07-29T18:49:58Z) - Self-supervised Feature Adaptation for 3D Industrial Anomaly Detection [59.41026558455904]
We focus on multi-modal anomaly detection. Specifically, we investigate early multi-modal approaches that attempted to utilize models pre-trained on large-scale visual datasets.
We propose a Local-to-global Self-supervised Feature Adaptation (LSFA) method to finetune the adaptors and learn task-oriented representation toward anomaly detection.
arXiv Detail & Related papers (2024-01-06T07:30:41Z) - TAB: Text-Align Anomaly Backbone Model for Industrial Inspection Tasks [12.660226544498023]
We propose a novel framework to adeptly train a backbone model tailored to the manufacturing domain.
Our approach concurrently considers visual and text-aligned embedding spaces for normal and abnormal conditions.
The resulting pre-trained backbone markedly enhances performance in industrial downstream tasks.
arXiv Detail & Related papers (2023-12-15T01:37:29Z) - ALP: Action-Aware Embodied Learning for Perception [60.64801970249279]
We introduce Action-Aware Embodied Learning for Perception (ALP)
ALP incorporates action information into representation learning through a combination of optimizing a reinforcement learning policy and an inverse dynamics prediction objective.
We show that ALP outperforms existing baselines in several downstream perception tasks.
arXiv Detail & Related papers (2023-06-16T21:51:04Z) - Unified Visual Relationship Detection with Vision and Language Models [89.77838890788638]
This work focuses on training a single visual relationship detector predicting over the union of label spaces from multiple datasets.
We propose UniVRD, a novel bottom-up method for Unified Visual Relationship Detection by leveraging vision and language models.
Empirical results on both human-object interaction detection and scene-graph generation demonstrate the competitive performance of our model.
arXiv Detail & Related papers (2023-03-16T00:06:28Z) - SuperAnimal pretrained pose estimation models for behavioral analysis [42.206265576708255]
Quantification of behavior is critical in applications ranging from neuroscience, veterinary medicine and animal conservation efforts.
We present a series of technical innovations that enable a new method, collectively called SuperAnimal, to develop unified foundation models.
arXiv Detail & Related papers (2022-03-14T18:46:57Z) - Crop-Transform-Paste: Self-Supervised Learning for Visual Tracking [137.26381337333552]
In this work, we develop the Crop-Transform-Paste operation, which is able to synthesize sufficient training data.
Since the object state is known in all synthesized data, existing deep trackers can be trained in routine ways without human annotation.
arXiv Detail & Related papers (2021-06-21T07:40:34Z) - Unsupervised Domain Adaption of Object Detectors: A Survey [87.08473838767235]
Recent advances in deep learning have led to the development of accurate and efficient models for various computer vision applications.
Learning highly accurate models relies on the availability of datasets with a large number of annotated images.
Due to this, model performance drops drastically when evaluated on label-scarce datasets having visually distinct images.
arXiv Detail & Related papers (2021-05-27T23:34:06Z) - Pretrained equivariant features improve unsupervised landmark discovery [69.02115180674885]
We formulate a two-step unsupervised approach that overcomes this challenge by first learning powerful pixel-based features.
Our method produces state-of-the-art results in several challenging landmark detection datasets.
arXiv Detail & Related papers (2021-04-07T05:42:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.