Zero-shot Shark Tracking and Biometrics from Aerial Imagery
- URL: http://arxiv.org/abs/2501.05717v1
- Date: Fri, 10 Jan 2025 05:29:09 GMT
- Title: Zero-shot Shark Tracking and Biometrics from Aerial Imagery
- Authors: Chinmay K Lalgudi, Mark E Leone, Jaden V Clark, Sergio Madrigal-Mora, Mario Espinoza,
- Abstract summary: Development of machine learning models for analyzing marine animal aerial imagery has followed the classical paradigm of training, testing, and deploying a new model for each dataset.<n>We introduce Frame Level ALIgment and tRacking (FLAIR), which leverages the video understanding of Segment Anything Model 2 (SAM2) and the vision-language capabilities of Contrastive Language-Image Pre-training (CLIP)<n>With a dataset of 18,000 drone images of Pacific nurse sharks, we trained state-of-the-art object detection models to compare against FLAIR.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The recent widespread adoption of drones for studying marine animals provides opportunities for deriving biological information from aerial imagery. The large scale of imagery data acquired from drones is well suited for machine learning (ML) analysis. Development of ML models for analyzing marine animal aerial imagery has followed the classical paradigm of training, testing, and deploying a new model for each dataset, requiring significant time, human effort, and ML expertise. We introduce Frame Level ALIgment and tRacking (FLAIR), which leverages the video understanding of Segment Anything Model 2 (SAM2) and the vision-language capabilities of Contrastive Language-Image Pre-training (CLIP). FLAIR takes a drone video as input and outputs segmentation masks of the species of interest across the video. Notably, FLAIR leverages a zero-shot approach, eliminating the need for labeled data, training a new model, or fine-tuning an existing model to generalize to other species. With a dataset of 18,000 drone images of Pacific nurse sharks, we trained state-of-the-art object detection models to compare against FLAIR. We show that FLAIR massively outperforms these object detectors and performs competitively against two human-in-the-loop methods for prompting SAM2, achieving a Dice score of 0.81. FLAIR readily generalizes to other shark species without additional human effort and can be combined with novel heuristics to automatically extract relevant information including length and tailbeat frequency. FLAIR has significant potential to accelerate aerial imagery analysis workflows, requiring markedly less human effort and expertise than traditional machine learning workflows, while achieving superior accuracy. By reducing the effort required for aerial imagery analysis, FLAIR allows scientists to spend more time interpreting results and deriving insights about marine ecosystems.
Related papers
- Learning to Track Any Points from Human Motion [55.831218129679144]
We propose an automated pipeline to generate pseudo-labeled training data for point tracking.<n>A point tracking model trained on AnthroTAP achieves annotated state-of-the-art performance on the TAP-Vid benchmark.
arXiv Detail & Related papers (2025-07-08T17:59:58Z) - Automated Detection of Salvin's Albatrosses: Improving Deep Learning Tools for Aerial Wildlife Surveys [4.936287307711449]
Unmanned Aerial Vehicles (UAVs) provide a cost-effective means of capturing high-resolution imagery.<n>We assess the performance of a general-purpose avian detection model, BirdDetector, in estimating the breeding population of Salvin's albatross (Thalassarche salvini) on the Bounty Islands, New Zealand.
arXiv Detail & Related papers (2025-05-15T22:42:44Z) - UAVTwin: Neural Digital Twins for UAVs using Gaussian Splatting [57.63613048492219]
We present UAVTwin, a method for creating digital twins from real-world environments and facilitating data augmentation for training downstream models embedded in unmanned aerial vehicles (UAVs)
This is achieved by integrating 3D Gaussian Splatting (3DGS) for reconstructing backgrounds along with controllable synthetic human models that display diverse appearances and actions in multiple poses.
arXiv Detail & Related papers (2025-04-02T22:17:30Z) - From underwater to aerial: a novel multi-scale knowledge distillation approach for coral reef monitoring [1.0644791181419937]
This study presents a novel multi-scale approach to coral reef monitoring, integrating fine-scale underwater imagery with medium-scale aerial imagery.
A transformer-based deep-learning model is trained on underwater images to detect the presence of 31 classes covering various coral morphotypes, associated fauna, and habitats.
The results show that the multi-scale methodology successfully extends fine-scale classification to larger reef areas, achieving a high degree of accuracy in predicting coral morphotypes and associated habitats.
arXiv Detail & Related papers (2025-02-25T06:12:33Z) - An Experimental Study on Exploring Strong Lightweight Vision Transformers via Masked Image Modeling Pre-Training [51.622652121580394]
Masked image modeling (MIM) pre-training for large-scale vision transformers (ViTs) has enabled promising downstream performance on top of the learned self-supervised ViT features.
In this paper, we question if the textitextremely simple lightweight ViTs' fine-tuning performance can also benefit from this pre-training paradigm.
Our pre-training with distillation on pure lightweight ViTs with vanilla/hierarchical design ($5.7M$/$6.5M$) can achieve $79.4%$/$78.9%$ top-1 accuracy on ImageNet-1
arXiv Detail & Related papers (2024-04-18T14:14:44Z) - Fantastic Animals and Where to Find Them: Segment Any Marine Animal with Dual SAM [62.85895749882285]
Marine Animal (MAS) involves segmenting animals within marine environments.
We propose a novel feature learning framework, named Dual-SAM for high-performance MAS.
Our proposed method achieves state-of-the-art performances on five widely-used MAS datasets.
arXiv Detail & Related papers (2024-04-07T15:34:40Z) - HawkI: Homography & Mutual Information Guidance for 3D-free Single Image to Aerial View [67.8213192993001]
We present HawkI, for synthesizing aerial-view images from text and an exemplar image.
HawkI blends the visual features from the input image within a pretrained text-to-2Dimage stable diffusion model.
At inference, HawkI employs a unique mutual information guidance formulation to steer the generated image towards faithfully replicating the semantic details of the input-image.
arXiv Detail & Related papers (2023-11-27T01:41:25Z) - Learning Human Action Recognition Representations Without Real Humans [66.61527869763819]
We present a benchmark that leverages real-world videos with humans removed and synthetic data containing virtual humans to pre-train a model.
We then evaluate the transferability of the representation learned on this data to a diverse set of downstream action recognition benchmarks.
Our approach outperforms previous baselines by up to 5%.
arXiv Detail & Related papers (2023-11-10T18:38:14Z) - Multimodal Foundation Models for Zero-shot Animal Species Recognition in
Camera Trap Images [57.96659470133514]
Motion-activated camera traps constitute an efficient tool for tracking and monitoring wildlife populations across the globe.
Supervised learning techniques have been successfully deployed to analyze such imagery, however training such techniques requires annotations from experts.
Reducing the reliance on costly labelled data has immense potential in developing large-scale wildlife tracking solutions with markedly less human labor.
arXiv Detail & Related papers (2023-11-02T08:32:00Z) - Whale Detection Enhancement through Synthetic Satellite Images [13.842008598751445]
We show that we can achieve a 15% performance boost on whale detection compared to using the real data alone for training.
We open source the code of the simulation platform SeaDroneSim2 and the dataset generated through it.
arXiv Detail & Related papers (2023-08-15T13:35:29Z) - Object counting from aerial remote sensing images: application to
wildlife and marine mammals [4.812718493682454]
Anthropogenic activities pose threats to wildlife and marine fauna.
This research study utilizes deep learning techniques to automate animal counting tasks.
The model accurately locates animals despite complex image background conditions.
arXiv Detail & Related papers (2023-06-17T23:14:53Z) - TempNet: Temporal Attention Towards the Detection of Animal Behaviour in
Videos [63.85815474157357]
We propose an efficient computer vision- and deep learning-based method for the detection of biological behaviours in videos.
TempNet uses an encoder bridge and residual blocks to maintain model performance with a two-staged, spatial, then temporal, encoder.
We demonstrate its application to the detection of sablefish (Anoplopoma fimbria) startle events.
arXiv Detail & Related papers (2022-11-17T23:55:12Z) - Rare Wildlife Recognition with Self-Supervised Representation Learning [0.0]
We present a methodology to reduce the amount of required training data by resorting to self-supervised pretraining.
We show that a combination of MoCo, CLD, and geometric augmentations outperforms conventional models pretrained on ImageNet by a large margin.
arXiv Detail & Related papers (2022-10-29T17:57:38Z) - SuperAnimal pretrained pose estimation models for behavioral analysis [42.206265576708255]
Quantification of behavior is critical in applications ranging from neuroscience, veterinary medicine and animal conservation efforts.
We present a series of technical innovations that enable a new method, collectively called SuperAnimal, to develop unified foundation models.
arXiv Detail & Related papers (2022-03-14T18:46:57Z) - Self-Supervised Pretraining and Controlled Augmentation Improve Rare
Wildlife Recognition in UAV Images [9.220908533011068]
We present a methodology to reduce the amount of required training data by resorting to self-supervised pretraining.
We show that a combination of MoCo, CLD, and geometric augmentations outperforms conventional models pre-trained on ImageNet by a large margin.
arXiv Detail & Related papers (2021-08-17T12:14:28Z) - Zoo-Tuning: Adaptive Transfer from a Zoo of Models [82.9120546160422]
Zoo-Tuning learns to adaptively transfer the parameters of pretrained models to the target task.
We evaluate our approach on a variety of tasks, including reinforcement learning, image classification, and facial landmark detection.
arXiv Detail & Related papers (2021-06-29T14:09:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.