Related papers: Self-Supervised Animal Identification for Long Videos

Self-Supervised Animal Identification for Long Videos

URL: http://arxiv.org/abs/2601.09663v1
Date: Wed, 14 Jan 2026 17:53:59 GMT
Title: Self-Supervised Animal Identification for Long Videos
Authors: Xuyang Fang, Sion Hannuna, Edwin Simpson, Neill Campbell,
Abstract summary: We introduce a highly efficient, self-supervised method that reframes animal identification as a global clustering task.<n>Our framework matches or surpasses supervised baselines trained on over 1,000 labeled frames.<n>This work enables practical, high-accuracy animal identification on consumer-grade hardware.
Score: 0.8233028449337972
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Identifying individual animals in long-duration videos is essential for behavioral ecology, wildlife monitoring, and livestock management. Traditional methods require extensive manual annotation, while existing self-supervised approaches are computationally demanding and ill-suited for long sequences due to memory constraints and temporal error propagation. We introduce a highly efficient, self-supervised method that reframes animal identification as a global clustering task rather than a sequential tracking problem. Our approach assumes a known, fixed number of individuals within a single video -- a common scenario in practice -- and requires only bounding box detections and the total count. By sampling pairs of frames, using a frozen pre-trained backbone, and employing a self-bootstrapping mechanism with the Hungarian algorithm for in-batch pseudo-label assignment, our method learns discriminative features without identity labels. We adapt a Binary Cross Entropy loss from vision-language models, enabling state-of-the-art accuracy ($>$97\%) while consuming less than 1 GB of GPU memory per batch -- an order of magnitude less than standard contrastive methods. Evaluated on challenging real-world datasets (3D-POP pigeons and 8-calves feeding videos), our framework matches or surpasses supervised baselines trained on over 1,000 labeled frames, effectively removing the manual annotation bottleneck. This work enables practical, high-accuracy animal identification on consumer-grade hardware, with broad applicability in resource-constrained research settings. All code written for this paper are \href{https://huggingface.co/datasets/tonyFang04/8-calves}{here}.

Related papers

Cattle-CLIP: A Multimodal Framework for Cattle Behaviour Recognition [5.45546363077543]
Cattle-CLIP is a multimodal deep learning framework for cattle behaviour recognition.<n>It is adapted from the large-scale image-language model CLIP by adding a temporal integration module.<n>Experiments show that Cattle-CLIP achieves 96.1% overall accuracy across six behaviours in a supervised setting.
arXiv Detail & Related papers (2025-10-10T09:43:12Z)
From Forest to Zoo: Great Ape Behavior Recognition with ChimpBehave [0.0]
We introduce ChimpBehave, a novel dataset featuring over 2 hours of video (approximately 193,000 video frames) of zoo-housed chimpanzees. ChimpBehave meticulously annotated with bounding boxes and behavior labels for action recognition. We benchmark our dataset using a state-of-the-art CNN-based action recognition model.
arXiv Detail & Related papers (2024-05-30T13:11:08Z)
MBW: Multi-view Bootstrapping in the Wild [30.038254895713276]
Multi-camera systems that train fine-grained detectors have shown promise in detecting such errors. The approach is based on calibrated cameras and rigid geometry, making it expensive, difficult to manage, and impractical in real-world scenarios. In this paper, we address these bottlenecks by combining a non-rigid 3D neural prior with deep flow to obtain high-fidelity landmark estimates. We are able to produce 2D results comparable to state-of-the-art fully supervised methods, along with 3D reconstructions that are impossible with other existing approaches.
arXiv Detail & Related papers (2022-10-04T16:27:54Z)
SuperAnimal pretrained pose estimation models for behavioral analysis [42.206265576708255]
Quantification of behavior is critical in applications ranging from neuroscience, veterinary medicine and animal conservation efforts. We present a series of technical innovations that enable a new method, collectively called SuperAnimal, to develop unified foundation models.
arXiv Detail & Related papers (2022-03-14T18:46:57Z)
Persistent Animal Identification Leveraging Non-Visual Markers [71.14999745312626]
We aim to locate and provide a unique identifier for each mouse in a cluttered home-cage environment through time. This is a very challenging problem due to (i) the lack of distinguishing visual features for each mouse, and (ii) the close confines of the scene with constant occlusion. Our approach achieves 77% accuracy on this animal identification problem, and is able to reject spurious detections when the animals are hidden.
arXiv Detail & Related papers (2021-12-13T17:11:32Z)
Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets [90.61266099147053]
We investigate efficient annotation strategies for collecting multi-class classification labels for a large collection of images. We propose modifications and best practices aimed at minimizing human labeling effort. Simulated experiments on a 125k image subset of the ImageNet100 show that it can be annotated to 80% top-1 accuracy with 0.35 annotations per image on average.
arXiv Detail & Related papers (2021-04-26T16:29:32Z)
MIST: Multiple Instance Self-Training Framework for Video Anomaly Detection [76.80153360498797]
We develop a multiple instance self-training framework (MIST) to efficiently refine task-specific discriminative representations. MIST is composed of 1) a multiple instance pseudo label generator, which adapts a sparse continuous sampling strategy to produce more reliable clip-level pseudo labels, and 2) a self-guided attention boosted feature encoder. Our method performs comparably to or even better than existing supervised and weakly supervised methods, specifically obtaining a frame-level AUC 94.83% on ShanghaiTech.
arXiv Detail & Related papers (2021-04-04T15:47:14Z)
Face Forensics in the Wild [121.23154918448618]
We construct a novel large-scale dataset, called FFIW-10K, which comprises 10,000 high-quality forgery videos. The manipulation procedure is fully automatic, controlled by a domain-adversarial quality assessment network. In addition, we propose a novel algorithm to tackle the task of multi-person face forgery detection.
arXiv Detail & Related papers (2021-03-30T05:06:19Z)
Uncertainty-Aware Weakly Supervised Action Detection from Untrimmed Videos [82.02074241700728]
In this paper, we present a prohibitive-level action recognition model that is trained with only video-frame labels. Our method per person detectors have been trained on large image datasets within Multiple Instance Learning framework. We show how we can apply our method in cases where the standard Multiple Instance Learning assumption, that each bag contains at least one instance with the specified label, is invalid.
arXiv Detail & Related papers (2020-07-21T10:45:05Z)
Visual Identification of Individual Holstein-Friesian Cattle via Deep Metric Learning [8.784100314325395]
Holstein-Friesian cattle exhibit individually-characteristic black and white coat patterns visually akin to those arising from Turing's reaction-diffusion systems. This work takes advantage of these natural markings in order to automate visual detection and biometric identification of individual Holstein-Friesians via convolutional neural networks and deep metric learning techniques.
arXiv Detail & Related papers (2020-06-16T14:41:55Z)
Evolving Losses for Unsupervised Video Representation Learning [91.2683362199263]
We present a new method to learn video representations from large-scale unlabeled video data. The proposed unsupervised representation learning results in a single RGB network and outperforms previous methods.
arXiv Detail & Related papers (2020-02-26T16:56:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.