BirdSAT: Cross-View Contrastive Masked Autoencoders for Bird Species
Classification and Mapping
- URL: http://arxiv.org/abs/2310.19168v1
- Date: Sun, 29 Oct 2023 22:08:00 GMT
- Title: BirdSAT: Cross-View Contrastive Masked Autoencoders for Bird Species
Classification and Mapping
- Authors: Srikumar Sastry, Subash Khanal, Aayush Dhakal, Di Huang, Nathan Jacobs
- Abstract summary: We propose a metadata-aware self-supervised learning(SSL) framework useful for fine-grained classification and ecological mapping of bird species around the world.
Our framework unifies two SSL strategies: Contrastive Learning(CL) and Masked Image Modeling(MIM), while also enriching the embedding space with metadata available with ground-level imagery of birds.
We demonstrate that our models learn fine-grained and geographically conditioned features of birds, by evaluating on two downstream tasks: fine-grained visual classification(FGVC) and cross-modal retrieval.
- Score: 22.30038765017189
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a metadata-aware self-supervised learning~(SSL)~framework useful
for fine-grained classification and ecological mapping of bird species around
the world. Our framework unifies two SSL strategies: Contrastive Learning~(CL)
and Masked Image Modeling~(MIM), while also enriching the embedding space with
metadata available with ground-level imagery of birds. We separately train
uni-modal and cross-modal ViT on a novel cross-view global bird species dataset
containing ground-level imagery, metadata (location, time), and corresponding
satellite imagery. We demonstrate that our models learn fine-grained and
geographically conditioned features of birds, by evaluating on two downstream
tasks: fine-grained visual classification~(FGVC) and cross-modal retrieval.
Pre-trained models learned using our framework achieve SotA performance on FGVC
of iNAT-2021 birds and in transfer learning settings for CUB-200-2011 and
NABirds datasets. Moreover, the impressive cross-modal retrieval performance of
our model enables the creation of species distribution maps across any
geographic region. The dataset and source code will be released at
https://github.com/mvrl/BirdSAT}.
Related papers
- Diffusion Models as Data Mining Tools [87.77999285241219]
This paper demonstrates how to use generative models trained for image synthesis as tools for visual data mining.
We show that after finetuning conditional diffusion models to synthesize images from a specific dataset, we can use these models to define a typicality measure.
This measure assesses how typical visual elements are for different data labels, such as geographic location, time stamps, semantic labels, or even the presence of a disease.
arXiv Detail & Related papers (2024-07-20T17:14:31Z) - OAM-TCD: A globally diverse dataset of high-resolution tree cover maps [8.336960607169175]
We present a novel open-access dataset for individual tree crown delineation (TCD) in high-resolution aerial imagery sourced from OpenMap (OAM)
Our dataset, OAM-TCD, comprises 5072 2048x2048px images at 10 cm/px resolution with associated human-labeled instance masks for over 280k individual and 56k groups of trees.
Using our dataset, we train reference instance and semantic segmentation models that compare favorably to existing state-of-the-art models.
arXiv Detail & Related papers (2024-07-16T14:11:29Z) - WildlifeDatasets: An open-source toolkit for animal re-identification [0.0]
WildlifeDatasets is an open-source toolkit for ecologists and computer-vision / machine-learning researchers.
WildlifeDatasets is written in Python and allows straightforward access to publicly available wildlife datasets.
We provide the first-ever foundation model for individual re-identification within a wide range of species - MegaDescriptor.
arXiv Detail & Related papers (2023-11-15T17:08:09Z) - SatBird: Bird Species Distribution Modeling with Remote Sensing and
Citizen Science Data [68.2366021016172]
We present SatBird, a satellite dataset of locations in the USA with labels derived from presence-absence observation data from the citizen science database eBird.
We also provide a dataset in Kenya representing low-data regimes.
We benchmark a set of baselines on our dataset, including SOTA models for remote sensing tasks.
arXiv Detail & Related papers (2023-11-02T02:00:27Z) - Transfer Learning with Semi-Supervised Dataset Annotation for Birdcall
Classification [0.0]
We present working notes on transfer learning with semi-supervised dataset annotation for the BirdCLEF 2023 competition.
Our approach utilizes existing off-the-shelf models, BirdNET and MixIT, to address representation and labeling challenges in the competition.
arXiv Detail & Related papers (2023-06-29T07:56:27Z) - Tackling Long-Tailed Category Distribution Under Domain Shifts [50.21255304847395]
Existing approaches cannot handle the scenario where both issues exist.
We designed three novel core functional blocks including Distribution Calibrated Classification Loss, Visual-Semantic Mapping and Semantic-Similarity Guided Augmentation.
Two new datasets were proposed for this problem, named AWA2-LTS and ImageNet-LTS.
arXiv Detail & Related papers (2022-07-20T19:07:46Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - MSeg: A Composite Dataset for Multi-domain Semantic Segmentation [100.17755160696939]
We present MSeg, a composite dataset that unifies semantic segmentation datasets from different domains.
We reconcile the generalization and bring the pixel-level annotations into alignment by relabeling more than 220,000 object masks in more than 80,000 images.
A model trained on MSeg ranks first on the WildDash-v1 leaderboard for robust semantic segmentation, with no exposure to WildDash data during training.
arXiv Detail & Related papers (2021-12-27T16:16:35Z) - Multi-Domain Few-Shot Learning and Dataset for Agricultural Applications [0.0]
We propose a method to learn from a few samples to automatically classify different pests, plants, and their diseases.
We learn a feature extractor to generate embeddings and then update the embeddings using Transformers.
We conduct 42 experiments in total to comprehensively analyze the model and it achieves up to 14% and 24% performance gains on few-shot image classification benchmarks.
arXiv Detail & Related papers (2021-09-21T04:20:18Z) - Campus3D: A Photogrammetry Point Cloud Benchmark for Hierarchical
Understanding of Outdoor Scene [76.4183572058063]
We present a richly-annotated 3D point cloud dataset for multiple outdoor scene understanding tasks.
The dataset has been point-wisely annotated with both hierarchical and instance-based labels.
We formulate a hierarchical learning problem for 3D point cloud segmentation and propose a measurement evaluating consistency across various hierarchies.
arXiv Detail & Related papers (2020-08-11T19:10:32Z) - Can Giraffes Become Birds? An Evaluation of Image-to-image Translation
for Data Generation [0.0]
We investigate image-to-image translation using Generative Adrial Networks (GANs) for generating new data.
An unsupervised cross-domain translator entitled InstaGAN was trained on giraffes and birds, along with their respective masks, to learn translation between both domains.
A dataset of synthetic bird images was generated using translation from originally giraffe images while preserving the original spatial arrangement and background.
arXiv Detail & Related papers (2020-01-10T19:29:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.