Multimodal Foundation Models for Zero-shot Animal Species Recognition in
Camera Trap Images
- URL: http://arxiv.org/abs/2311.01064v1
- Date: Thu, 2 Nov 2023 08:32:00 GMT
- Title: Multimodal Foundation Models for Zero-shot Animal Species Recognition in
Camera Trap Images
- Authors: Zalan Fabian, Zhongqi Miao, Chunyuan Li, Yuanhan Zhang, Ziwei Liu,
Andr\'es Hern\'andez, Andr\'es Montes-Rojas, Rafael Escucha, Laura Siabatto,
Andr\'es Link, Pablo Arbel\'aez, Rahul Dodhia, Juan Lavista Ferres
- Abstract summary: Motion-activated camera traps constitute an efficient tool for tracking and monitoring wildlife populations across the globe.
Supervised learning techniques have been successfully deployed to analyze such imagery, however training such techniques requires annotations from experts.
Reducing the reliance on costly labelled data has immense potential in developing large-scale wildlife tracking solutions with markedly less human labor.
- Score: 57.96659470133514
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Due to deteriorating environmental conditions and increasing human activity,
conservation efforts directed towards wildlife is crucial. Motion-activated
camera traps constitute an efficient tool for tracking and monitoring wildlife
populations across the globe. Supervised learning techniques have been
successfully deployed to analyze such imagery, however training such techniques
requires annotations from experts. Reducing the reliance on costly labelled
data therefore has immense potential in developing large-scale wildlife
tracking solutions with markedly less human labor. In this work we propose
WildMatch, a novel zero-shot species classification framework that leverages
multimodal foundation models. In particular, we instruction tune
vision-language models to generate detailed visual descriptions of camera trap
images using similar terminology to experts. Then, we match the generated
caption to an external knowledge base of descriptions in order to determine the
species in a zero-shot manner. We investigate techniques to build instruction
tuning datasets for detailed animal description generation and propose a novel
knowledge augmentation technique to enhance caption quality. We demonstrate the
performance of WildMatch on a new camera trap dataset collected in the
Magdalena Medio region of Colombia.
Related papers
- MetaCap: Meta-learning Priors from Multi-View Imagery for Sparse-view Human Performance Capture and Rendering [91.76893697171117]
We propose a method for efficient and high-quality geometry recovery and novel view synthesis given very sparse or even a single view of the human.
Our key idea is to meta-learn the radiance field weights solely from potentially sparse multi-view videos.
We collect a new dataset, WildDynaCap, which contains subjects captured in, both, a dense camera dome and in-the-wild sparse camera rigs.
arXiv Detail & Related papers (2024-03-27T17:59:54Z) - Learning the 3D Fauna of the Web [70.01196719128912]
We develop 3D-Fauna, an approach that learns a pan-category deformable 3D animal model for more than 100 animal species jointly.
One crucial bottleneck of modeling animals is the limited availability of training data.
We show that prior category-specific attempts fail to generalize to rare species with limited training images.
arXiv Detail & Related papers (2024-01-04T18:32:48Z) - Reviving the Context: Camera Trap Species Classification as Link Prediction on Multimodal Knowledge Graphs [31.22129440376567]
We exploit the structured context linked to camera trap images to boost out-of-distribution generalization for species classification tasks in camera traps.
A picture of a wild animal could be linked to details about the time and place it was captured, as well as structured biological knowledge about the animal species.
We propose a novel framework that transforms species classification as link prediction in a multimodal knowledge graph.
arXiv Detail & Related papers (2023-12-31T23:32:03Z) - Learning Subject-Aware Cropping by Outpainting Professional Photos [69.0772948657867]
We propose a weakly-supervised approach to learn what makes a high-quality subject-aware crop from professional stock images.
Our insight is to combine a library of stock images with a modern, pre-trained text-to-image diffusion model.
We are able to automatically generate a large dataset of cropped-uncropped training pairs to train a cropping model.
arXiv Detail & Related papers (2023-12-19T11:57:54Z) - Florida Wildlife Camera Trap Dataset [48.99466876948454]
We introduce a challenging wildlife camera trap classification dataset collected from two different locations in Southwestern Florida.
The dataset consists of 104,495 images featuring visually similar species, varying illumination conditions, skewed class distribution, and including samples of endangered species.
arXiv Detail & Related papers (2021-06-23T18:53:15Z) - Unifying data for fine-grained visual species classification [15.14767769034929]
We present an initial deep convolutional neural network model, trained on 2.9M images across 465 fine-grained species.
The long-term goal is to enable scientists to make conservation recommendations from near real-time analysis of species abundance and population health.
arXiv Detail & Related papers (2020-09-24T01:04:18Z) - WhoAmI: An Automatic Tool for Visual Recognition of Tiger and Leopard
Individuals in the Wild [3.1708876837195157]
We develop automatic algorithms that are able to detect animals, identify the species of animals and to recognize individual animals for two species.
We demonstrate the effectiveness of our approach on a data set of camera-trap images recorded in the jungles of Southern India.
arXiv Detail & Related papers (2020-06-17T16:17:46Z) - Automatic Detection and Recognition of Individuals in Patterned Species [4.163860911052052]
We develop a framework for automatic detection and recognition of individuals in different patterned species.
We use the recently proposed Faster-RCNN object detection framework to efficiently detect animals in images.
We evaluate our recognition system on zebra and jaguar images to show generalization to other patterned species.
arXiv Detail & Related papers (2020-05-06T15:29:21Z) - Deformation-aware Unpaired Image Translation for Pose Estimation on
Laboratory Animals [56.65062746564091]
We aim to capture the pose of neuroscience model organisms, without using any manual supervision, to study how neural circuits orchestrate behaviour.
Our key contribution is the explicit and independent modeling of appearance, shape and poses in an unpaired image translation framework.
We demonstrate improved pose estimation accuracy on Drosophila melanogaster (fruit fly), Caenorhabditis elegans (worm) and Danio rerio (zebrafish)
arXiv Detail & Related papers (2020-01-23T15:34:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.