Related papers: Animal Pose Labeling Using General-Purpose Point Trackers

Animal Pose Labeling Using General-Purpose Point Trackers

URL: http://arxiv.org/abs/2506.03868v1
Date: Wed, 04 Jun 2025 11:59:22 GMT
Title: Animal Pose Labeling Using General-Purpose Point Trackers
Authors: Zhuoyang Pan, Boxiao Pan, Guandao Yang, Adam W. Harley, Leonidas Guibas,
Abstract summary: We propose an animal pose labeling pipeline that follows a different strategy, i.e. test time optimization.<n>We fine-tune a lightweight appearance embedding inside a pre-trained general-purpose point tracker on a sparse set of annotated frames.<n>Our method achieves state-of-the-art performance at a reasonable cost.
Score: 11.014266034079352
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Automatically estimating animal poses from videos is important for studying animal behaviors. Existing methods do not perform reliably since they are trained on datasets that are not comprehensive enough to capture all necessary animal behaviors. However, it is very challenging to collect such datasets due to the large variations in animal morphology. In this paper, we propose an animal pose labeling pipeline that follows a different strategy, i.e. test time optimization. Given a video, we fine-tune a lightweight appearance embedding inside a pre-trained general-purpose point tracker on a sparse set of annotated frames. These annotations can be obtained from human labelers or off-the-shelf pose detectors. The fine-tuned model is then applied to the rest of the frames for automatic labeling. Our method achieves state-of-the-art performance at a reasonable annotation cost. We believe our pipeline offers a valuable tool for the automatic quantification of animal behavior. Visit our project webpage at https://zhuoyang-pan.github.io/animal-labeling.

Related papers

OpenAnimalTracks: A Dataset for Animal Track Recognition [2.3020018305241337]
We introduce OpenAnimalTracks dataset, the first publicly available labeled dataset designed to facilitate the automated classification and detection of animal footprints. We show the potential of automated footprint identification with representative classifiers and detection models. We hope our dataset paves the way for automated animal tracking techniques, enhancing our ability to protect and manage biodiversity.
arXiv Detail & Related papers (2024-06-14T00:37:17Z)
AnimateZoo: Zero-shot Video Generation of Cross-Species Animation via Subject Alignment [64.02822911038848]
We present AnimateZoo, a zero-shot diffusion-based video generator to produce animal animations. Key technique used in our AnimateZoo is subject alignment, which includes two steps. Our model is capable of generating videos characterized by accurate movements, consistent appearance, and high-fidelity frames.
arXiv Detail & Related papers (2024-04-07T12:57:41Z)
APTv2: Benchmarking Animal Pose Estimation and Tracking with a Large-scale Dataset and Beyond [27.50166679588048]
APTv2 is the pioneering large-scale benchmark for animal pose estimation and tracking. It comprises 2,749 video clips filtered and collected from 30 distinct animal species. We provide high-quality keypoint and tracking annotations for a total of 84,611 animal instances.
arXiv Detail & Related papers (2023-12-25T04:49:49Z)
Multimodal Foundation Models for Zero-shot Animal Species Recognition in Camera Trap Images [57.96659470133514]
Motion-activated camera traps constitute an efficient tool for tracking and monitoring wildlife populations across the globe. Supervised learning techniques have been successfully deployed to analyze such imagery, however training such techniques requires annotations from experts. Reducing the reliance on costly labelled data has immense potential in developing large-scale wildlife tracking solutions with markedly less human labor.
arXiv Detail & Related papers (2023-11-02T08:32:00Z)
MammalNet: A Large-scale Video Benchmark for Mammal Recognition and Behavior Understanding [38.3767550066302]
MammalNet is a large-scale animal behavior dataset with taxonomy-guided annotations of mammals and their common behaviors. It contains over 18K videos totaling 539 hours, which is 10 times larger than the largest existing animal behavior dataset. We establish three benchmarks on MammalNet: standard animal and behavior recognition, compositional low-shot animal and behavior recognition, and behavior detection.
arXiv Detail & Related papers (2023-06-01T11:45:33Z)
ScarceNet: Animal Pose Estimation with Scarce Annotations [74.48263583706712]
ScarceNet is a pseudo label-based approach to generate artificial labels for the unlabeled images. We evaluate our approach on the challenging AP-10K dataset, where our approach outperforms existing semi-supervised approaches by a large margin.
arXiv Detail & Related papers (2023-03-27T09:15:53Z)
APT-36K: A Large-scale Benchmark for Animal Pose Estimation and Tracking [77.87449881852062]
APT-36K is the first large-scale benchmark for animal pose estimation and tracking. It consists of 2,400 video clips collected and filtered from 30 animal species with 15 frames for each video, resulting in 36,000 frames in total. We benchmark several representative models on the following three tracks: (1) supervised animal pose estimation on a single frame under intra- and inter-domain transfer learning settings, (2) inter-species domain generalization test for unseen animals, and (3) animal pose estimation with animal tracking.
arXiv Detail & Related papers (2022-06-12T07:18:36Z)
SuperAnimal pretrained pose estimation models for behavioral analysis [42.206265576708255]
Quantification of behavior is critical in applications ranging from neuroscience, veterinary medicine and animal conservation efforts. We present a series of technical innovations that enable a new method, collectively called SuperAnimal, to develop unified foundation models.
arXiv Detail & Related papers (2022-03-14T18:46:57Z)
AcinoSet: A 3D Pose Estimation Dataset and Baseline Models for Cheetahs in the Wild [51.35013619649463]
We present an extensive dataset of free-running cheetahs in the wild, called AcinoSet. The dataset contains 119,490 frames of multi-view synchronized high-speed video footage, camera calibration files and 7,588 human-annotated frames. The resulting 3D trajectories, human-checked 3D ground truth, and an interactive tool to inspect the data is also provided.
arXiv Detail & Related papers (2021-03-24T15:54:11Z)
Improving and Simplifying Pattern Exploiting Training [81.77863825517511]
Pattern Exploiting Training (PET) is a recent approach that leverages patterns for few-shot learning. In this paper, we focus on few shot learning without any unlabeled data and introduce ADAPET. ADAPET outperforms PET on SuperGLUE without any task-specific unlabeled data.
arXiv Detail & Related papers (2021-03-22T15:52:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.