APTv2: Benchmarking Animal Pose Estimation and Tracking with a
Large-scale Dataset and Beyond
- URL: http://arxiv.org/abs/2312.15612v1
- Date: Mon, 25 Dec 2023 04:49:49 GMT
- Title: APTv2: Benchmarking Animal Pose Estimation and Tracking with a
Large-scale Dataset and Beyond
- Authors: Yuxiang Yang, Yingqi Deng, Yufei Xu, Jing Zhang
- Abstract summary: APTv2 is the pioneering large-scale benchmark for animal pose estimation and tracking.
It comprises 2,749 video clips filtered and collected from 30 distinct animal species.
We provide high-quality keypoint and tracking annotations for a total of 84,611 animal instances.
- Score: 27.50166679588048
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Animal Pose Estimation and Tracking (APT) is a critical task in detecting and
monitoring the keypoints of animals across a series of video frames, which is
essential for understanding animal behavior. Past works relating to animals
have primarily focused on either animal tracking or single-frame animal pose
estimation only, neglecting the integration of both aspects. The absence of
comprehensive APT datasets inhibits the progression and evaluation of animal
pose estimation and tracking methods based on videos, thereby constraining
their real-world applications. To fill this gap, we introduce APTv2, the
pioneering large-scale benchmark for animal pose estimation and tracking. APTv2
comprises 2,749 video clips filtered and collected from 30 distinct animal
species. Each video clip includes 15 frames, culminating in a total of 41,235
frames. Following meticulous manual annotation and stringent verification, we
provide high-quality keypoint and tracking annotations for a total of 84,611
animal instances, split into easy and hard subsets based on the number of
instances that exists in the frame. With APTv2 as the foundation, we establish
a simple baseline method named \posetrackmethodname and provide benchmarks for
representative models across three tracks: (1) single-frame animal pose
estimation track to evaluate both intra- and inter-domain transfer learning
performance, (2) low-data transfer and generalization track to evaluate the
inter-species domain generalization performance, and (3) animal pose tracking
track. Our experimental results deliver key empirical insights, demonstrating
that APTv2 serves as a valuable benchmark for animal pose estimation and
tracking. It also presents new challenges and opportunities for future
research. The code and dataset are released at
\href{https://github.com/ViTAE-Transformer/APTv2}{https://github.com/ViTAE-Transformer/APTv2}.
Related papers
- AnimalFormer: Multimodal Vision Framework for Behavior-based Precision Livestock Farming [0.0]
We introduce a multimodal vision framework for precision livestock farming.
We harness the power of GroundingDINO, HQSAM, and ViTPose models.
This suite enables comprehensive behavioral analytics from video data without invasive animal tagging.
arXiv Detail & Related papers (2024-06-14T04:42:44Z) - 3D-MuPPET: 3D Multi-Pigeon Pose Estimation and Tracking [14.52333427647304]
We present 3D-MuPPET, a framework to estimate and track 3D poses of up to 10 pigeons at interactive speed using multiple camera views.
For identity matching, we first dynamically match 2D detections to global identities in the first frame, then use a 2D tracker to maintain IDs across views in subsequent frames.
We show that 3D-MuPPET also works in outdoors without additional annotations from natural environments.
arXiv Detail & Related papers (2023-08-29T14:02:27Z) - PSVT: End-to-End Multi-person 3D Pose and Shape Estimation with
Progressive Video Transformers [71.72888202522644]
We propose a new end-to-end multi-person 3D and Shape estimation framework with progressive Video Transformer.
In PSVT, a-temporal encoder (PGA) captures the global feature dependencies among spatial objects.
To handle the variances of objects as time proceeds, a novel scheme of progressive decoding is used.
arXiv Detail & Related papers (2023-03-16T09:55:43Z) - TAP-Vid: A Benchmark for Tracking Any Point in a Video [84.94877216665793]
We formalize the problem of tracking arbitrary physical points on surfaces over longer video clips, naming it tracking any point (TAP)
We introduce a companion benchmark, TAP-Vid, which is composed of both real-world videos with accurate human annotations of point tracks, and synthetic videos with perfect ground-truth point tracks.
We propose a simple end-to-end point tracking model TAP-Net, showing that it outperforms all prior methods on our benchmark when trained on synthetic data.
arXiv Detail & Related papers (2022-11-07T17:57:02Z) - APT-36K: A Large-scale Benchmark for Animal Pose Estimation and Tracking [77.87449881852062]
APT-36K is the first large-scale benchmark for animal pose estimation and tracking.
It consists of 2,400 video clips collected and filtered from 30 animal species with 15 frames for each video, resulting in 36,000 frames in total.
We benchmark several representative models on the following three tracks: (1) supervised animal pose estimation on a single frame under intra- and inter-domain transfer learning settings, (2) inter-species domain generalization test for unseen animals, and (3) animal pose estimation with animal tracking.
arXiv Detail & Related papers (2022-06-12T07:18:36Z) - AnimalTrack: A Large-scale Benchmark for Multi-Animal Tracking in the
Wild [26.794672185860538]
We introduce AnimalTrack, a large-scale benchmark for multi-animal tracking in the wild.
AnimalTrack consists of 58 sequences from a diverse selection of 10 common animal categories.
We extensively evaluate 14 state-of-the-art representative trackers.
arXiv Detail & Related papers (2022-04-30T04:23:59Z) - Animal Kingdom: A Large and Diverse Dataset for Animal Behavior
Understanding [4.606145900630665]
We create a large and diverse dataset, Animal Kingdom, that provides multiple annotated tasks.
Our dataset contains 50 hours of annotated videos to localize relevant animal behavior segments.
We propose a Collaborative Action Recognition (CARe) model that learns general and specific features for action recognition with unseen new animals.
arXiv Detail & Related papers (2022-04-18T02:05:15Z) - AP-10K: A Benchmark for Animal Pose Estimation in the Wild [83.17759850662826]
We propose AP-10K, the first large-scale benchmark for general animal pose estimation.
AP-10K consists of 10,015 images collected and filtered from 23 animal families and 60 species.
Results provide sound empirical evidence on the superiority of learning from diverse animals species in terms of both accuracy and generalization ability.
arXiv Detail & Related papers (2021-08-28T10:23:34Z) - AcinoSet: A 3D Pose Estimation Dataset and Baseline Models for Cheetahs
in the Wild [51.35013619649463]
We present an extensive dataset of free-running cheetahs in the wild, called AcinoSet.
The dataset contains 119,490 frames of multi-view synchronized high-speed video footage, camera calibration files and 7,588 human-annotated frames.
The resulting 3D trajectories, human-checked 3D ground truth, and an interactive tool to inspect the data is also provided.
arXiv Detail & Related papers (2021-03-24T15:54:11Z) - TAO: A Large-Scale Benchmark for Tracking Any Object [95.87310116010185]
Tracking Any Object dataset consists of 2,907 high resolution videos, captured in diverse environments, which are half a minute long on average.
We ask annotators to label objects that move at any point in the video, and give names to them post factum.
Our vocabulary is both significantly larger and qualitatively different from existing tracking datasets.
arXiv Detail & Related papers (2020-05-20T21:07:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.