Related papers: BirdSAT: Cross-View Contrastive Masked Autoencoders for Bird Species Classification and Mapping

BirdSAT: Cross-View Contrastive Masked Autoencoders for Bird Species Classification and Mapping

URL: http://arxiv.org/abs/2310.19168v1
Date: Sun, 29 Oct 2023 22:08:00 GMT
Title: BirdSAT: Cross-View Contrastive Masked Autoencoders for Bird Species Classification and Mapping
Authors: Srikumar Sastry, Subash Khanal, Aayush Dhakal, Di Huang, Nathan Jacobs
Abstract summary: We propose a metadata-aware self-supervised learning(SSL) framework useful for fine-grained classification and ecological mapping of bird species around the world. Our framework unifies two SSL strategies: Contrastive Learning(CL) and Masked Image Modeling(MIM), while also enriching the embedding space with metadata available with ground-level imagery of birds. We demonstrate that our models learn fine-grained and geographically conditioned features of birds, by evaluating on two downstream tasks: fine-grained visual classification(FGVC) and cross-modal retrieval.
Score: 22.30038765017189
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We propose a metadata-aware self-supervised learning~(SSL)~framework useful for fine-grained classification and ecological mapping of bird species around the world. Our framework unifies two SSL strategies: Contrastive Learning~(CL) and Masked Image Modeling~(MIM), while also enriching the embedding space with metadata available with ground-level imagery of birds. We separately train uni-modal and cross-modal ViT on a novel cross-view global bird species dataset containing ground-level imagery, metadata (location, time), and corresponding satellite imagery. We demonstrate that our models learn fine-grained and geographically conditioned features of birds, by evaluating on two downstream tasks: fine-grained visual classification~(FGVC) and cross-modal retrieval. Pre-trained models learned using our framework achieve SotA performance on FGVC of iNAT-2021 birds and in transfer learning settings for CUB-200-2011 and NABirds datasets. Moreover, the impressive cross-modal retrieval performance of our model enables the creation of species distribution maps across any geographic region. The dataset and source code will be released at https://github.com/mvrl/BirdSAT}.

Related papers

SASP: Strip-Aware Spatial Perception for Fine-Grained Bird Image Classification [5.420786129061269]
This paper proposes a fine-grained bird image classification framework based on strip-aware spatial perception.<n>The proposed method incorporates two novel modules: extensional perception aggregator (EPA) and channel semantic weaving (CSW)<n>Built upon a ResNet-50 backbone, the model enables jump-wise connection of extended structural features across the spatial domain.
arXiv Detail & Related papers (2025-05-30T09:10:12Z)
Visual WetlandBirds Dataset: Bird Species Identification and Behavior Recognition in Videos [0.0]
This study introduces the first fine-grained video dataset specifically designed for bird behavior detection and species classification. The proposed dataset comprises 178 videos recorded in Spanish wetlands, capturing 13 different bird species performing 7 distinct behavior classes.
arXiv Detail & Related papers (2025-01-15T16:34:20Z)
Diffusion Models as Data Mining Tools [87.77999285241219]
This paper demonstrates how to use generative models trained for image synthesis as tools for visual data mining. We show that after finetuning conditional diffusion models to synthesize images from a specific dataset, we can use these models to define a typicality measure. This measure assesses how typical visual elements are for different data labels, such as geographic location, time stamps, semantic labels, or even the presence of a disease.
arXiv Detail & Related papers (2024-07-20T17:14:31Z)
OAM-TCD: A globally diverse dataset of high-resolution tree cover maps [8.336960607169175]
We present a novel open-access dataset for individual tree crown delineation (TCD) in high-resolution aerial imagery sourced from OpenMap (OAM) Our dataset, OAM-TCD, comprises 5072 2048x2048px images at 10 cm/px resolution with associated human-labeled instance masks for over 280k individual and 56k groups of trees. Using our dataset, we train reference instance and semantic segmentation models that compare favorably to existing state-of-the-art models.
arXiv Detail & Related papers (2024-07-16T14:11:29Z)
WildlifeDatasets: An open-source toolkit for animal re-identification [0.0]
WildlifeDatasets is an open-source toolkit for ecologists and computer-vision / machine-learning researchers. WildlifeDatasets is written in Python and allows straightforward access to publicly available wildlife datasets. We provide the first-ever foundation model for individual re-identification within a wide range of species - MegaDescriptor.
arXiv Detail & Related papers (2023-11-15T17:08:09Z)
SatBird: Bird Species Distribution Modeling with Remote Sensing and Citizen Science Data [68.2366021016172]
We present SatBird, a satellite dataset of locations in the USA with labels derived from presence-absence observation data from the citizen science database eBird. We also provide a dataset in Kenya representing low-data regimes. We benchmark a set of baselines on our dataset, including SOTA models for remote sensing tasks.
arXiv Detail & Related papers (2023-11-02T02:00:27Z)
Transfer Learning with Semi-Supervised Dataset Annotation for Birdcall Classification [0.0]
We present working notes on transfer learning with semi-supervised dataset annotation for the BirdCLEF 2023 competition. Our approach utilizes existing off-the-shelf models, BirdNET and MixIT, to address representation and labeling challenges in the competition.
arXiv Detail & Related papers (2023-06-29T07:56:27Z)
Tackling Long-Tailed Category Distribution Under Domain Shifts [50.21255304847395]
Existing approaches cannot handle the scenario where both issues exist. We designed three novel core functional blocks including Distribution Calibrated Classification Loss, Visual-Semantic Mapping and Semantic-Similarity Guided Augmentation. Two new datasets were proposed for this problem, named AWA2-LTS and ImageNet-LTS.
arXiv Detail & Related papers (2022-07-20T19:07:46Z)
Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage. We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets. By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z)
MSeg: A Composite Dataset for Multi-domain Semantic Segmentation [100.17755160696939]
We present MSeg, a composite dataset that unifies semantic segmentation datasets from different domains. We reconcile the generalization and bring the pixel-level annotations into alignment by relabeling more than 220,000 object masks in more than 80,000 images. A model trained on MSeg ranks first on the WildDash-v1 leaderboard for robust semantic segmentation, with no exposure to WildDash data during training.
arXiv Detail & Related papers (2021-12-27T16:16:35Z)
Multi-Domain Few-Shot Learning and Dataset for Agricultural Applications [0.0]
We propose a method to learn from a few samples to automatically classify different pests, plants, and their diseases. We learn a feature extractor to generate embeddings and then update the embeddings using Transformers. We conduct 42 experiments in total to comprehensively analyze the model and it achieves up to 14% and 24% performance gains on few-shot image classification benchmarks.
arXiv Detail & Related papers (2021-09-21T04:20:18Z)
Campus3D: A Photogrammetry Point Cloud Benchmark for Hierarchical Understanding of Outdoor Scene [76.4183572058063]
We present a richly-annotated 3D point cloud dataset for multiple outdoor scene understanding tasks. The dataset has been point-wisely annotated with both hierarchical and instance-based labels. We formulate a hierarchical learning problem for 3D point cloud segmentation and propose a measurement evaluating consistency across various hierarchies.
arXiv Detail & Related papers (2020-08-11T19:10:32Z)
Can Giraffes Become Birds? An Evaluation of Image-to-image Translation for Data Generation [0.0]
We investigate image-to-image translation using Generative Adrial Networks (GANs) for generating new data. An unsupervised cross-domain translator entitled InstaGAN was trained on giraffes and birds, along with their respective masks, to learn translation between both domains. A dataset of synthetic bird images was generated using translation from originally giraffe images while preserving the original spatial arrangement and background.
arXiv Detail & Related papers (2020-01-10T19:29:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.