Domain-Adaptive Pretraining Improves Primate Behavior Recognition
- URL: http://arxiv.org/abs/2509.12193v1
- Date: Mon, 15 Sep 2025 17:54:20 GMT
- Title: Domain-Adaptive Pretraining Improves Primate Behavior Recognition
- Authors: Felix B. Mueller, Timo Lueddecke, Richard Vogg, Alexander S. Ecker,
- Abstract summary: We show that we can utilize self-supervised learning to considerably improve action recognition on primate behavior.<n>On two datasets of great ape behavior (PanAf and ChimpACT), we outperform published state-of-the-art action recognition models by 6.1 %pt. accuracy and 6.3 %pt. mAP, respectively.
- Score: 43.65707056647872
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Computer vision for animal behavior offers promising tools to aid research in ecology, cognition, and to support conservation efforts. Video camera traps allow for large-scale data collection, but high labeling costs remain a bottleneck to creating large-scale datasets. We thus need data-efficient learning approaches. In this work, we show that we can utilize self-supervised learning to considerably improve action recognition on primate behavior. On two datasets of great ape behavior (PanAf and ChimpACT), we outperform published state-of-the-art action recognition models by 6.1 %pt. accuracy and 6.3 %pt. mAP, respectively. We achieve this by utilizing a pretrained V-JEPA model and applying domain-adaptive pretraining (DAP), i.e. continuing the pretraining with in-domain data. We show that most of the performance gain stems from the DAP. Our method promises great potential for improving the recognition of animal behavior, as DAP does not require labeled samples. Code is available at https://github.com/ecker-lab/dap-behavior
Related papers
- PriVi: Towards A General-Purpose Video Model For Primate Behavior In The Wild [50.656578456979496]
We introduce PriVi, a large-scale primate-centric video pretraining dataset.<n>We pretrain V-JEPA, a large-scale video model, on PriVi to learn primate-specific representations.<n>Results demonstrate that primate-centric pretraining substantially improves data efficiency and generalization.
arXiv Detail & Related papers (2025-11-12T19:27:40Z) - Cattle-CLIP: A Multimodal Framework for Cattle Behaviour Recognition [5.45546363077543]
Cattle-CLIP is a multimodal deep learning framework for cattle behaviour recognition.<n>It is adapted from the large-scale image-language model CLIP by adding a temporal integration module.<n>Experiments show that Cattle-CLIP achieves 96.1% overall accuracy across six behaviours in a supervised setting.
arXiv Detail & Related papers (2025-10-10T09:43:12Z) - Is Large-Scale Pretraining the Secret to Good Domain Generalization? [69.80606575323691]
Multi-Source Domain Generalization (DG) is the task of training on multiple source domains and achieving high classification performance on unseen target domains.<n>Recent methods combine robust features from web-scale pretrained backbones with new features learned from source data, and this has dramatically improved benchmark results.<n>We show that all evaluated DG methods struggle on DomainBed-OOP, while recent methods excel on DomainBed-IP.
arXiv Detail & Related papers (2024-12-03T21:43:11Z) - From Forest to Zoo: Great Ape Behavior Recognition with ChimpBehave [0.0]
We introduce ChimpBehave, a novel dataset featuring over 2 hours of video (approximately 193,000 video frames) of zoo-housed chimpanzees.
ChimpBehave meticulously annotated with bounding boxes and behavior labels for action recognition.
We benchmark our dataset using a state-of-the-art CNN-based action recognition model.
arXiv Detail & Related papers (2024-05-30T13:11:08Z) - CVB: A Video Dataset of Cattle Visual Behaviors [13.233877352490923]
Existing datasets for cattle behavior recognition are mostly small, lack well-defined labels, or are collected in unrealistic controlled environments.
We introduce a new dataset, called Cattle Visual Behaviors (CVB), that consists of 502 video clips, each fifteen seconds long, captured in natural lighting conditions, and annotated with eleven visually perceptible behaviors of grazing cattle.
arXiv Detail & Related papers (2023-05-26T00:44:11Z) - Explored An Effective Methodology for Fine-Grained Snake Recognition [8.908667065576632]
We design a strong multimodal backbone to utilize various meta-information to assist in fine-grained identification.
In order to take full advantage of unlabeled datasets, we use self-supervised learning and supervised learning joint training.
Our method can achieve a macro f1 score 92.7% and 89.4% on private and public dataset, respectively, which is the 1st place among the participators on private leaderboard.
arXiv Detail & Related papers (2022-07-24T02:19:15Z) - SuperAnimal pretrained pose estimation models for behavioral analysis [42.206265576708255]
Quantification of behavior is critical in applications ranging from neuroscience, veterinary medicine and animal conservation efforts.
We present a series of technical innovations that enable a new method, collectively called SuperAnimal, to develop unified foundation models.
arXiv Detail & Related papers (2022-03-14T18:46:57Z) - Neural Semi-supervised Learning for Text Classification Under
Large-Scale Pretraining [51.19885385587916]
We conduct studies on semi-supervised learning in the task of text classification under the context of large-scale LM pretraining.
Our work marks an initial step in understanding the behavior of semi-supervised learning models under the context of large-scale pretraining.
arXiv Detail & Related papers (2020-11-17T13:39:05Z) - Rethinking Pre-training and Self-training [105.27954735761678]
We investigate self-training as another method to utilize additional data on the same setup and contrast it against ImageNet pre-training.
Our study reveals the generality and flexibility of self-training with three additional insights.
For example, on the COCO object detection dataset, pre-training benefits when we use one fifth of the labeled data, and hurts accuracy when we use all labeled data.
arXiv Detail & Related papers (2020-06-11T23:59:16Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z) - Don't Stop Pretraining: Adapt Language Models to Domains and Tasks [81.99843216550306]
We present a study across four domains (biomedical and computer science publications, news, and reviews) and eight classification tasks.
A second phase of pretraining in-domain (domain-adaptive pretraining) leads to performance gains.
Adapting to the task's unlabeled data (task-adaptive pretraining) improves performance even after domain-adaptive pretraining.
arXiv Detail & Related papers (2020-04-23T04:21:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.