Related papers: MammAlps: A multi-view video behavior monitoring dataset of wild mammals in the Swiss Alps

MammAlps: A multi-view video behavior monitoring dataset of wild mammals in the Swiss Alps

URL: http://arxiv.org/abs/2503.18223v1
Date: Sun, 23 Mar 2025 21:51:58 GMT
Title: MammAlps: A multi-view video behavior monitoring dataset of wild mammals in the Swiss Alps
Authors: Valentin Gabeff, Haozhe Qi, Brendan Flaherty, Gencer Sumbül, Alexander Mathis, Devis Tuia,
Abstract summary: MammAlps is a dataset of wildlife behavior monitoring from 9 camera-traps in the Swiss National Park.<n>Based on 6135 single animal clips, we propose the first hierarchical and multimodal animal behavior recognition benchmark.<n>We also propose a second ecology-oriented benchmark aiming at identifying activities, species, number of individuals and meteorological conditions.
Score: 41.58000025132071
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Monitoring wildlife is essential for ecology and ethology, especially in light of the increasing human impact on ecosystems. Camera traps have emerged as habitat-centric sensors enabling the study of wildlife populations at scale with minimal disturbance. However, the lack of annotated video datasets limits the development of powerful video understanding models needed to process the vast amount of fieldwork data collected. To advance research in wild animal behavior monitoring we present MammAlps, a multimodal and multi-view dataset of wildlife behavior monitoring from 9 camera-traps in the Swiss National Park. MammAlps contains over 14 hours of video with audio, 2D segmentation maps and 8.5 hours of individual tracks densely labeled for species and behavior. Based on 6135 single animal clips, we propose the first hierarchical and multimodal animal behavior recognition benchmark using audio, video and reference scene segmentation maps as inputs. Furthermore, we also propose a second ecology-oriented benchmark aiming at identifying activities, species, number of individuals and meteorological conditions from 397 multi-view and long-term ecological events, including false positive triggers. We advocate that both tasks are complementary and contribute to bridging the gap between machine learning and ecology. Code and data are available at: https://github.com/eceo-epfl/MammAlps

Related papers

The SA-FARI Dataset: Segment Anything in Footage of Animals for Recognition and Identification [30.709102465224746]
We introduce SA-FARI, the largest open-source MAT dataset for wild animals.<n>It comprises 11,609 camera trap videos collected over approximately 10 years (2014-2024) from 741 locations across 4 continents, spanning 99 species categories.<n>Each video is exhaustively annotated culminating in 46 hours of densely annotated footage containing 16,224 masklet identities and 942,702 individual bounding boxes, segmentation masks, and species labels.<n>We present comprehensive benchmarks on SA-FARI using state-of-the-art vision-language models for detection and tracking, including SAM 3, evaluated with both species-specific and generic animal
arXiv Detail & Related papers (2025-11-19T17:07:08Z)
PriVi: Towards A General-Purpose Video Model For Primate Behavior In The Wild [50.656578456979496]
We introduce PriVi, a large-scale primate-centric video pretraining dataset.<n>We pretrain V-JEPA, a large-scale video model, on PriVi to learn primate-specific representations.<n>Results demonstrate that primate-centric pretraining substantially improves data efficiency and generalization.
arXiv Detail & Related papers (2025-11-12T19:27:40Z)
kabr-tools: Automated Framework for Multi-Species Behavioral Monitoring [4.303185550812535]
We present kabr-tools, an open-source package for automated multi-species behavioral monitoring.<n>This framework integrates drone-based video with machine learning systems to extract behavioral, social, and spatial metrics from wildlife footage.<n>Compared to ground-based methods, drone-based observations significantly improved behavioral granularity, reducing visibility loss by 15%.
arXiv Detail & Related papers (2025-10-02T14:03:55Z)
The iNaturalist Sounds Dataset [60.157076990024606]
iNatSounds is a collection of 230,000 audio files capturing sounds from over 5,500 species, contributed by more than 27,000 recordists worldwide.<n>The dataset encompasses sounds from birds, mammals, insects, reptiles, and amphibians, with audio and species labels derived from observations submitted to iNaturalist.<n>We envision models trained on this data powering next-generation public engagement applications, and assisting biologists, ecologists, and land use managers in processing large audio collections.
arXiv Detail & Related papers (2025-05-31T02:07:37Z)
BioCLIP 2: Emergent Properties from Scaling Hierarchical Contrastive Learning [51.341003735575335]
We find emergent behaviors in biological vision models via large-scale contrastive vision-language training.<n>We train BioCLIP 2 on TreeOfLife-200M to distinguish different species.<n>We identify emergent properties in the learned embedding space of BioCLIP 2.
arXiv Detail & Related papers (2025-05-29T17:48:20Z)
BuckTales : A multi-UAV dataset for multi-object tracking and re-identification of wild antelopes [0.6267336085190178]
BuckTales is the first large-scale UAV dataset designed to solve multi-object tracking and re-identification problem in wild animals. The MOT dataset includes over 1.2 million annotations including 680 tracks across 12 high-resolution (5.4K) videos. The Re-ID dataset includes 730 individuals captured with two UAVs simultaneously.
arXiv Detail & Related papers (2024-11-11T11:55:14Z)
Multimodal Foundation Models for Zero-shot Animal Species Recognition in Camera Trap Images [57.96659470133514]
Motion-activated camera traps constitute an efficient tool for tracking and monitoring wildlife populations across the globe. Supervised learning techniques have been successfully deployed to analyze such imagery, however training such techniques requires annotations from experts. Reducing the reliance on costly labelled data has immense potential in developing large-scale wildlife tracking solutions with markedly less human labor.
arXiv Detail & Related papers (2023-11-02T08:32:00Z)
SatBird: Bird Species Distribution Modeling with Remote Sensing and Citizen Science Data [68.2366021016172]
We present SatBird, a satellite dataset of locations in the USA with labels derived from presence-absence observation data from the citizen science database eBird. We also provide a dataset in Kenya representing low-data regimes. We benchmark a set of baselines on our dataset, including SOTA models for remote sensing tasks.
arXiv Detail & Related papers (2023-11-02T02:00:27Z)
Meerkat Behaviour Recognition Dataset [3.53348643468069]
We introduce a large meerkat behaviour recognition video dataset with diverse annotated behaviours. This dataset includes videos from two positions within the meerkat enclosure at the Wellington Zoo (Wellington, New Zealand)
arXiv Detail & Related papers (2023-06-20T06:50:50Z)
MammalNet: A Large-scale Video Benchmark for Mammal Recognition and Behavior Understanding [38.3767550066302]
MammalNet is a large-scale animal behavior dataset with taxonomy-guided annotations of mammals and their common behaviors. It contains over 18K videos totaling 539 hours, which is 10 times larger than the largest existing animal behavior dataset. We establish three benchmarks on MammalNet: standard animal and behavior recognition, compositional low-shot animal and behavior recognition, and behavior detection.
arXiv Detail & Related papers (2023-06-01T11:45:33Z)
MABe22: A Multi-Species Multi-Task Benchmark for Learned Representations of Behavior [28.878568752724235]
We introduce MABe22, a benchmark to assess the quality of learned behavior representations. This dataset is collected from a variety of biology experiments. We test self-supervised video and trajectory representation learning methods to demonstrate the use of our benchmark.
arXiv Detail & Related papers (2022-07-21T15:51:30Z)
Animal Kingdom: A Large and Diverse Dataset for Animal Behavior Understanding [4.606145900630665]
We create a large and diverse dataset, Animal Kingdom, that provides multiple annotated tasks. Our dataset contains 50 hours of annotated videos to localize relevant animal behavior segments. We propose a Collaborative Action Recognition (CARe) model that learns general and specific features for action recognition with unseen new animals.
arXiv Detail & Related papers (2022-04-18T02:05:15Z)
SuperAnimal pretrained pose estimation models for behavioral analysis [42.206265576708255]
Quantification of behavior is critical in applications ranging from neuroscience, veterinary medicine and animal conservation efforts. We present a series of technical innovations that enable a new method, collectively called SuperAnimal, to develop unified foundation models.
arXiv Detail & Related papers (2022-03-14T18:46:57Z)
Florida Wildlife Camera Trap Dataset [48.99466876948454]
We introduce a challenging wildlife camera trap classification dataset collected from two different locations in Southwestern Florida. The dataset consists of 104,495 images featuring visually similar species, varying illumination conditions, skewed class distribution, and including samples of endangered species.
arXiv Detail & Related papers (2021-06-23T18:53:15Z)
AcinoSet: A 3D Pose Estimation Dataset and Baseline Models for Cheetahs in the Wild [51.35013619649463]
We present an extensive dataset of free-running cheetahs in the wild, called AcinoSet. The dataset contains 119,490 frames of multi-view synchronized high-speed video footage, camera calibration files and 7,588 human-annotated frames. The resulting 3D trajectories, human-checked 3D ground truth, and an interactive tool to inspect the data is also provided.
arXiv Detail & Related papers (2021-03-24T15:54:11Z)
Muti-view Mouse Social Behaviour Recognition with Deep Graphical Model [124.26611454540813]
Social behaviour analysis of mice is an invaluable tool to assess therapeutic efficacy of neurodegenerative diseases. Because of the potential to create rich descriptions of mouse social behaviors, the use of multi-view video recordings for rodent observations is increasingly receiving much attention. We propose a novel multiview latent-attention and dynamic discriminative model that jointly learns view-specific and view-shared sub-structures.
arXiv Detail & Related papers (2020-11-04T18:09:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.