The SA-FARI Dataset: Segment Anything in Footage of Animals for Recognition and Identification
- URL: http://arxiv.org/abs/2511.15622v2
- Date: Mon, 24 Nov 2025 17:02:04 GMT
- Title: The SA-FARI Dataset: Segment Anything in Footage of Animals for Recognition and Identification
- Authors: Dante Francisco Wasmuht, Otto Brookes, Maximillian Schall, Pablo Palencia, Chris Beirne, Tilo Burghardt, Majid Mirmehdi, Hjalmar Kühl, Mimi Arandjelovic, Sam Pottie, Peter Bermant, Brandon Asheim, Yi Jin Toh, Adam Elzinga, Jason Holmberg, Andrew Whitworth, Eleanor Flatt, Laura Gustafson, Chaitanya Ryali, Yuan-Ting Hu, Baishan Guo, Andrew Westbury, Kate Saenko, Didac Suris,
- Abstract summary: We introduce SA-FARI, the largest open-source MAT dataset for wild animals.<n>It comprises 11,609 camera trap videos collected over approximately 10 years (2014-2024) from 741 locations across 4 continents, spanning 99 species categories.<n>Each video is exhaustively annotated culminating in 46 hours of densely annotated footage containing 16,224 masklet identities and 942,702 individual bounding boxes, segmentation masks, and species labels.<n>We present comprehensive benchmarks on SA-FARI using state-of-the-art vision-language models for detection and tracking, including SAM 3, evaluated with both species-specific and generic animal
- Score: 30.709102465224746
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automated video analysis is critical for wildlife conservation. A foundational task in this domain is multi-animal tracking (MAT), which underpins applications such as individual re-identification and behavior recognition. However, existing datasets are limited in scale, constrained to a few species, or lack sufficient temporal and geographical diversity - leaving no suitable benchmark for training general-purpose MAT models applicable across wild animal populations. To address this, we introduce SA-FARI, the largest open-source MAT dataset for wild animals. It comprises 11,609 camera trap videos collected over approximately 10 years (2014-2024) from 741 locations across 4 continents, spanning 99 species categories. Each video is exhaustively annotated culminating in ~46 hours of densely annotated footage containing 16,224 masklet identities and 942,702 individual bounding boxes, segmentation masks, and species labels. Alongside the task-specific annotations, we publish anonymized camera trap locations for each video. Finally, we present comprehensive benchmarks on SA-FARI using state-of-the-art vision-language models for detection and tracking, including SAM 3, evaluated with both species-specific and generic animal prompts. We also compare against vision-only methods developed specifically for wildlife analysis. SA-FARI is the first large-scale dataset to combine high species diversity, multi-region coverage, and high-quality spatio-temporal annotations, offering a new foundation for advancing generalizable multianimal tracking in the wild. The dataset is available at https://www.conservationxlabs.com/sa-fari.
Related papers
- SpatialVID: A Large-Scale Video Dataset with Spatial Annotations [58.01259302233675]
SpatialVID is a dataset of in-the-wild videos with diverse scenes, camera movements and dense 3D annotations such as per-frame camera poses, depth, and motion instructions.<n>We collect more than 21,000 hours of raw video, and process them into 2.7 million clips through a hierarchical filtering pipeline.<n>A subsequent annotation pipeline enriches these clips with detailed spatial and semantic information, including camera poses, depth maps, dynamic masks, structured captions, and serialized motion instructions.
arXiv Detail & Related papers (2025-09-11T17:59:31Z) - The iNaturalist Sounds Dataset [60.157076990024606]
iNatSounds is a collection of 230,000 audio files capturing sounds from over 5,500 species, contributed by more than 27,000 recordists worldwide.<n>The dataset encompasses sounds from birds, mammals, insects, reptiles, and amphibians, with audio and species labels derived from observations submitted to iNaturalist.<n>We envision models trained on this data powering next-generation public engagement applications, and assisting biologists, ecologists, and land use managers in processing large audio collections.
arXiv Detail & Related papers (2025-05-31T02:07:37Z) - MammAlps: A multi-view video behavior monitoring dataset of wild mammals in the Swiss Alps [41.58000025132071]
MammAlps is a dataset of wildlife behavior monitoring from 9 camera-traps in the Swiss National Park.<n>Based on 6135 single animal clips, we propose the first hierarchical and multimodal animal behavior recognition benchmark.<n>We also propose a second ecology-oriented benchmark aiming at identifying activities, species, number of individuals and meteorological conditions.
arXiv Detail & Related papers (2025-03-23T21:51:58Z) - The PanAf-FGBG Dataset: Understanding the Impact of Backgrounds in Wildlife Behaviour Recognition [9.865022241248116]
We present the PanAf-FGBG dataset, featuring 20 hours of wild chimpanzee behaviours, recorded at over 350 individual camera locations.<n>Uniquely, it pairs every video with a chimpanzee (referred to as a foreground video) with a corresponding background video (with no chimpanzee) from the same camera location.<n>This setup enables, for the first time, direct evaluation of in-distribution and out-of-distribution conditions, and for the impact of backgrounds on behaviour recognition models to be quantified.
arXiv Detail & Related papers (2025-02-28T16:18:57Z) - OpenAnimalTracks: A Dataset for Animal Track Recognition [2.3020018305241337]
We introduce OpenAnimalTracks dataset, the first publicly available labeled dataset designed to facilitate the automated classification and detection of animal footprints.
We show the potential of automated footprint identification with representative classifiers and detection models.
We hope our dataset paves the way for automated animal tracking techniques, enhancing our ability to protect and manage biodiversity.
arXiv Detail & Related papers (2024-06-14T00:37:17Z) - Multimodal Foundation Models for Zero-shot Animal Species Recognition in
Camera Trap Images [57.96659470133514]
Motion-activated camera traps constitute an efficient tool for tracking and monitoring wildlife populations across the globe.
Supervised learning techniques have been successfully deployed to analyze such imagery, however training such techniques requires annotations from experts.
Reducing the reliance on costly labelled data has immense potential in developing large-scale wildlife tracking solutions with markedly less human labor.
arXiv Detail & Related papers (2023-11-02T08:32:00Z) - Meerkat Behaviour Recognition Dataset [3.53348643468069]
We introduce a large meerkat behaviour recognition video dataset with diverse annotated behaviours.
This dataset includes videos from two positions within the meerkat enclosure at the Wellington Zoo (Wellington, New Zealand)
arXiv Detail & Related papers (2023-06-20T06:50:50Z) - APT-36K: A Large-scale Benchmark for Animal Pose Estimation and Tracking [77.87449881852062]
APT-36K is the first large-scale benchmark for animal pose estimation and tracking.
It consists of 2,400 video clips collected and filtered from 30 animal species with 15 frames for each video, resulting in 36,000 frames in total.
We benchmark several representative models on the following three tracks: (1) supervised animal pose estimation on a single frame under intra- and inter-domain transfer learning settings, (2) inter-species domain generalization test for unseen animals, and (3) animal pose estimation with animal tracking.
arXiv Detail & Related papers (2022-06-12T07:18:36Z) - Animal Kingdom: A Large and Diverse Dataset for Animal Behavior
Understanding [4.606145900630665]
We create a large and diverse dataset, Animal Kingdom, that provides multiple annotated tasks.
Our dataset contains 50 hours of annotated videos to localize relevant animal behavior segments.
We propose a Collaborative Action Recognition (CARe) model that learns general and specific features for action recognition with unseen new animals.
arXiv Detail & Related papers (2022-04-18T02:05:15Z) - TAO: A Large-Scale Benchmark for Tracking Any Object [95.87310116010185]
Tracking Any Object dataset consists of 2,907 high resolution videos, captured in diverse environments, which are half a minute long on average.
We ask annotators to label objects that move at any point in the video, and give names to them post factum.
Our vocabulary is both significantly larger and qualitatively different from existing tracking datasets.
arXiv Detail & Related papers (2020-05-20T21:07:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.