OpenWildlife: Open-Vocabulary Multi-Species Wildlife Detector for Geographically-Diverse Aerial Imagery
- URL: http://arxiv.org/abs/2506.19204v1
- Date: Tue, 24 Jun 2025 00:10:19 GMT
- Title: OpenWildlife: Open-Vocabulary Multi-Species Wildlife Detector for Geographically-Diverse Aerial Imagery
- Authors: Muhammed Patel, Javier Noa Turnes, Jayden Hsiao, Linlin Xu, David Clausi,
- Abstract summary: We introduce OpenWildlife, an open-vocabulary wildlife detector designed for multi-species identification in diverse aerial imagery.<n>OW leverages language-aware embeddings and a novel adaptation of the Grounding-DINO framework, enabling it to identify species specified through natural language inputs across both terrestrial and marine environments.<n>OW outperforms most existing methods, achieving up to textbf0.981 mAP50 with fine-tuning and textbf0.597 mAP50 on seven datasets featuring novel species.
- Score: 5.612783442210011
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We introduce OpenWildlife (OW), an open-vocabulary wildlife detector designed for multi-species identification in diverse aerial imagery. While existing automated methods perform well in specific settings, they often struggle to generalize across different species and environments due to limited taxonomic coverage and rigid model architectures. In contrast, OW leverages language-aware embeddings and a novel adaptation of the Grounding-DINO framework, enabling it to identify species specified through natural language inputs across both terrestrial and marine environments. Trained on 15 datasets, OW outperforms most existing methods, achieving up to \textbf{0.981} mAP50 with fine-tuning and \textbf{0.597} mAP50 on seven datasets featuring novel species. Additionally, we introduce an efficient search algorithm that combines k-nearest neighbors and breadth-first search to prioritize areas where social species are likely to be found. This approach captures over \textbf{95\%} of species while exploring only \textbf{33\%} of the available images. To support reproducibility, we publicly release our source code and dataset splits, establishing OW as a flexible, cost-effective solution for global biodiversity assessments.
Related papers
- CrypticBio: A Large Multimodal Dataset for Visually Confusing Biodiversity [3.73232466691291]
We present CrypticBio, the largest publicly available dataset of visually confusing species.<n>Criticized from real-world trends in species misidentification among community annotators of iNaturalist, CrypticBio contains 52K unique cryptic groups spanning 67K species.
arXiv Detail & Related papers (2025-05-16T14:35:56Z) - VR-RAG: Open-vocabulary Species Recognition with RAG-Assisted Large Multi-Modal Models [33.346206174676794]
We focus on open-vocabulary bird species recognition, where the goal is to classify species based on their descriptions.<n>Traditional benchmarks like CUB-200-2011 have been evaluated in a closed-vocabulary paradigm.<n>We show that the performance of current systems when evaluated under settings closely aligned with open-vocabulary drops by a huge margin.
arXiv Detail & Related papers (2025-05-08T20:33:31Z) - Zooming In on Fakes: A Novel Dataset for Localized AI-Generated Image Detection with Forgery Amplification Approach [69.01456182499486]
textbfBR-Gen is a large-scale dataset of 150,000 locally forged images with diverse scene-aware annotations.<n>textbfNFA-ViT is a Noise-guided Forgery Amplification Vision Transformer that enhances the detection of localized forgeries.
arXiv Detail & Related papers (2025-04-16T09:57:23Z) - Feedforward Few-shot Species Range Estimation [61.60698161072356]
Knowing where a particular species can or cannot be found on Earth is crucial for ecological research and conservation efforts.<n> accurate range estimates are only available for a relatively small proportion of all known species.<n>We outline a new approach for few-shot species range estimation to address the challenge of accurately estimating the range of a species from limited data.
arXiv Detail & Related papers (2025-02-20T19:13:29Z) - TaxaBind: A Unified Embedding Space for Ecological Applications [7.291750095728984]
We present TaxaBind, a unified embedding space for characterizing any species of interest.
TaxaBind is a multimodal embedding space across six modalities: ground-level images of species, geographic location, satellite image, text, audio, and environmental features.
arXiv Detail & Related papers (2024-11-01T15:41:30Z) - Combining Observational Data and Language for Species Range Estimation [63.65684199946094]
We propose a novel approach combining millions of citizen science species observations with textual descriptions from Wikipedia.<n>Our framework maps locations, species, and text descriptions into a common space, enabling zero-shot range estimation from textual descriptions.<n>Our approach also acts as a strong prior when combined with observational data, resulting in more accurate range estimation with less data.
arXiv Detail & Related papers (2024-10-14T17:22:55Z) - Adaptive High-Frequency Transformer for Diverse Wildlife Re-Identification [33.0352672906987]
Wildlife ReID involves utilizing visual technology to identify specific individuals of wild animals in different scenarios.
We present a unified, multi-species general framework for wildlife ReID.
arXiv Detail & Related papers (2024-10-09T15:16:30Z) - An Open-World, Diverse, Cross-Spatial-Temporal Benchmark for Dynamic Wild Person Re-Identification [58.5877965612088]
Person re-identification (ReID) has made great strides thanks to the data-driven deep learning techniques.
The existing benchmark datasets lack diversity, and models trained on these data cannot generalize well to dynamic wild scenarios.
We develop a new Open-World, Diverse, Cross-Spatial-Temporal dataset named OWD with several distinct features.
arXiv Detail & Related papers (2024-03-22T11:21:51Z) - WildlifeDatasets: An open-source toolkit for animal re-identification [0.0]
WildlifeDatasets is an open-source toolkit for ecologists and computer-vision / machine-learning researchers.
WildlifeDatasets is written in Python and allows straightforward access to publicly available wildlife datasets.
We provide the first-ever foundation model for individual re-identification within a wide range of species - MegaDescriptor.
arXiv Detail & Related papers (2023-11-15T17:08:09Z) - SatBird: Bird Species Distribution Modeling with Remote Sensing and
Citizen Science Data [68.2366021016172]
We present SatBird, a satellite dataset of locations in the USA with labels derived from presence-absence observation data from the citizen science database eBird.
We also provide a dataset in Kenya representing low-data regimes.
We benchmark a set of baselines on our dataset, including SOTA models for remote sensing tasks.
arXiv Detail & Related papers (2023-11-02T02:00:27Z) - Spatial Implicit Neural Representations for Global-Scale Species Mapping [72.92028508757281]
Given a set of locations where a species has been observed, the goal is to build a model to predict whether the species is present or absent at any location.
Traditional methods struggle to take advantage of emerging large-scale crowdsourced datasets.
We use Spatial Implicit Neural Representations (SINRs) to jointly estimate the geographical range of 47k species simultaneously.
arXiv Detail & Related papers (2023-06-05T03:36:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.