WildlifeDatasets: An open-source toolkit for animal re-identification
- URL: http://arxiv.org/abs/2311.09118v2
- Date: Thu, 14 Dec 2023 08:04:16 GMT
- Title: WildlifeDatasets: An open-source toolkit for animal re-identification
- Authors: Vojt\v{e}ch \v{C}erm\'ak, Lukas Picek, Luk\'a\v{s} Adam, Kostas
Papafitsoros
- Abstract summary: WildlifeDatasets is an open-source toolkit for ecologists and computer-vision / machine-learning researchers.
WildlifeDatasets is written in Python and allows straightforward access to publicly available wildlife datasets.
We provide the first-ever foundation model for individual re-identification within a wide range of species - MegaDescriptor.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In this paper, we present WildlifeDatasets
(https://github.com/WildlifeDatasets/wildlife-datasets) - an open-source
toolkit intended primarily for ecologists and computer-vision /
machine-learning researchers. The WildlifeDatasets is written in Python, allows
straightforward access to publicly available wildlife datasets, and provides a
wide variety of methods for dataset pre-processing, performance analysis, and
model fine-tuning. We showcase the toolkit in various scenarios and baseline
experiments, including, to the best of our knowledge, the most comprehensive
experimental comparison of datasets and methods for wildlife re-identification,
including both local descriptors and deep learning approaches. Furthermore, we
provide the first-ever foundation model for individual re-identification within
a wide range of species - MegaDescriptor - that provides state-of-the-art
performance on animal re-identification datasets and outperforms other
pre-trained models such as CLIP and DINOv2 by a significant margin. To make the
model available to the general public and to allow easy integration with any
existing wildlife monitoring applications, we provide multiple MegaDescriptor
flavors (i.e., Small, Medium, and Large) through the HuggingFace hub
(https://huggingface.co/BVRA).
Related papers
- Meta-Feature Adapter: Integrating Environmental Metadata for Enhanced Animal Re-identification [7.272706868932979]
We propose a lightweight module designed to integrate environmental metadata into vision-language foundation models, such as CLIP.
Our approach translates environmental metadata into natural language descriptions, encodes them into metadata-aware text embeddings, and incorporates these embeddings into image features through a cross-attention mechanism.
arXiv Detail & Related papers (2025-01-23T04:14:59Z) - Multispecies Animal Re-ID Using a Large Community-Curated Dataset [0.19418036471925312]
We construct a dataset that includes 49 species, 37K individual animals, and 225K images, using this data to train a single embedding network for all species.
Our model consistently outperforms models trained separately on each species, achieving an average gain of 12.5% in top-1 accuracy.
The model is already in production use for 60+ species in a large-scale wildlife monitoring system.
arXiv Detail & Related papers (2024-12-07T09:56:33Z) - Categorical Keypoint Positional Embedding for Robust Animal Re-Identification [22.979350771097966]
Animal re-identification (ReID) has become an indispensable tool in ecological research.
Unlike human ReID, animal ReID faces significant challenges due to the high variability in animal poses, diverse environmental conditions, and the inability to directly apply pre-trained models to animal data.
This work introduces an innovative keypoint propagation mechanism, which utilizes a single annotated pre-trained diffusion model.
arXiv Detail & Related papers (2024-12-01T14:09:00Z) - BirdSAT: Cross-View Contrastive Masked Autoencoders for Bird Species
Classification and Mapping [22.30038765017189]
We propose a metadata-aware self-supervised learning(SSL) framework useful for fine-grained classification and ecological mapping of bird species around the world.
Our framework unifies two SSL strategies: Contrastive Learning(CL) and Masked Image Modeling(MIM), while also enriching the embedding space with metadata available with ground-level imagery of birds.
We demonstrate that our models learn fine-grained and geographically conditioned features of birds, by evaluating on two downstream tasks: fine-grained visual classification(FGVC) and cross-modal retrieval.
arXiv Detail & Related papers (2023-10-29T22:08:00Z) - SPRINT: A Unified Toolkit for Evaluating and Demystifying Zero-shot
Neural Sparse Retrieval [92.27387459751309]
We provide SPRINT, a unified Python toolkit for evaluating neural sparse retrieval.
We establish strong and reproducible zero-shot sparse retrieval baselines across the well-acknowledged benchmark, BEIR.
We show that SPLADEv2 produces sparse representations with a majority of tokens outside of the original query and document.
arXiv Detail & Related papers (2023-07-19T22:48:02Z) - infoVerse: A Universal Framework for Dataset Characterization with
Multidimensional Meta-information [68.76707843019886]
infoVerse is a universal framework for dataset characterization.
infoVerse captures multidimensional characteristics of datasets by incorporating various model-driven meta-information.
In three real-world applications (data pruning, active learning, and data annotation), the samples chosen on infoVerse space consistently outperform strong baselines.
arXiv Detail & Related papers (2023-05-30T18:12:48Z) - Modeling Entities as Semantic Points for Visual Information Extraction
in the Wild [55.91783742370978]
We propose an alternative approach to precisely and robustly extract key information from document images.
We explicitly model entities as semantic points, i.e., center points of entities are enriched with semantic information describing the attributes and relationships of different entities.
The proposed method can achieve significantly enhanced performance on entity labeling and linking, compared with previous state-of-the-art models.
arXiv Detail & Related papers (2023-03-23T08:21:16Z) - Omnidata: A Scalable Pipeline for Making Multi-Task Mid-Level Vision
Datasets from 3D Scans [103.92680099373567]
This paper introduces a pipeline to parametrically sample and render multi-task vision datasets from comprehensive 3D scans from the real world.
Changing the sampling parameters allows one to "steer" the generated datasets to emphasize specific information.
Common architectures trained on a generated starter dataset reached state-of-the-art performance on multiple common vision tasks and benchmarks.
arXiv Detail & Related papers (2021-10-11T04:21:46Z) - Unsupervised Domain Adaptive Learning via Synthetic Data for Person
Re-identification [101.1886788396803]
Person re-identification (re-ID) has gained more and more attention due to its widespread applications in video surveillance.
Unfortunately, the mainstream deep learning methods still need a large quantity of labeled data to train models.
In this paper, we develop a data collector to automatically generate synthetic re-ID samples in a computer game, and construct a data labeler to simultaneously annotate them.
arXiv Detail & Related papers (2021-09-12T15:51:41Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.