Leveraging Habitat Information for Fine-grained Bird Identification
- URL: http://arxiv.org/abs/2312.14999v1
- Date: Fri, 22 Dec 2023 16:23:22 GMT
- Title: Leveraging Habitat Information for Fine-grained Bird Identification
- Authors: Tin Nguyen, Anh Nguyen
- Abstract summary: We are the first to explore integrating habitat information, one of the four major cues for identifying birds by ornithologists, into modern bird classifiers.
We focus on two leading model types: CNNs and ViTs trained on the downstream bird datasets; and original, multi-modal CLIP.
Training CNNs and ViTs with habitat-augmented data results in an improvement of up to +0.83 and +0.23 points on NABirds and CUB-200, respectively.
- Score: 4.392299539811761
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Traditional bird classifiers mostly rely on the visual characteristics of
birds. Some prior works even train classifiers to be invariant to the
background, completely discarding the living environment of birds. Instead, we
are the first to explore integrating habitat information, one of the four major
cues for identifying birds by ornithologists, into modern bird classifiers. We
focus on two leading model types: (1) CNNs and ViTs trained on the downstream
bird datasets; and (2) original, multi-modal CLIP. Training CNNs and ViTs with
habitat-augmented data results in an improvement of up to +0.83 and +0.23
points on NABirds and CUB-200, respectively. Similarly, adding habitat
descriptors to the prompts for CLIP yields a substantial accuracy boost of up
to +0.99 and +1.1 points on NABirds and CUB-200, respectively. We find
consistent accuracy improvement after integrating habitat features into the
image augmentation process and into the textual descriptors of vision-language
CLIP classifiers. Code is available at:
https://anonymous.4open.science/r/reasoning-8B7E/.
Related papers
- AudioProtoPNet: An interpretable deep learning model for bird sound classification [1.49199020343864]
This study introduces AudioProtoPNet, an adaptation of the Prototypical Part Network (ProtoPNet) for multi-label bird sound classification.
It is an inherently interpretable model that uses a ConvNeXt backbone to extract embeddings.
The model was trained on the BirdSet training dataset, which consists of 9,734 bird species and over 6,800 hours of recordings.
arXiv Detail & Related papers (2024-04-16T09:37:41Z) - BirdSAT: Cross-View Contrastive Masked Autoencoders for Bird Species
Classification and Mapping [22.30038765017189]
We propose a metadata-aware self-supervised learning(SSL) framework useful for fine-grained classification and ecological mapping of bird species around the world.
Our framework unifies two SSL strategies: Contrastive Learning(CL) and Masked Image Modeling(MIM), while also enriching the embedding space with metadata available with ground-level imagery of birds.
We demonstrate that our models learn fine-grained and geographically conditioned features of birds, by evaluating on two downstream tasks: fine-grained visual classification(FGVC) and cross-modal retrieval.
arXiv Detail & Related papers (2023-10-29T22:08:00Z) - Exploring Meta Information for Audio-based Zero-shot Bird Classification [113.17261694996051]
This study investigates how meta-information can improve zero-shot audio classification.
We use bird species as an example case study due to the availability of rich and diverse meta-data.
arXiv Detail & Related papers (2023-09-15T13:50:16Z) - Recognition of Unseen Bird Species by Learning from Field Guides [23.137536032163855]
We exploit field guides to learn bird species recognition, in particular zero-shot recognition of unseen species.
We study two approaches: (1) a contrastive encoding of illustrations, which can be fed into standard zero-shot learning schemes; and (2) a novel method that leverages the fact that illustrations are also images.
Our results show that illustrations from field guides, which are readily available for a wide range of species, are indeed a competitive source of side information for zero-shot learning.
arXiv Detail & Related papers (2022-06-03T09:13:46Z) - Calibrating Class Activation Maps for Long-Tailed Visual Recognition [60.77124328049557]
We present two effective modifications of CNNs to improve network learning from long-tailed distribution.
First, we present a Class Activation Map (CAMC) module to improve the learning and prediction of network classifiers.
Second, we investigate the use of normalized classifiers for representation learning in long-tailed problems.
arXiv Detail & Related papers (2021-08-29T05:45:03Z) - No Fear of Heterogeneity: Classifier Calibration for Federated Learning
with Non-IID Data [78.69828864672978]
A central challenge in training classification models in the real-world federated system is learning with non-IID data.
We propose a novel and simple algorithm called Virtual Representations (CCVR), which adjusts the classifier using virtual representations sampled from an approximated ssian mixture model.
Experimental results demonstrate that CCVR state-of-the-art performance on popular federated learning benchmarks including CIFAR-10, CIFAR-100, and CINIC-10.
arXiv Detail & Related papers (2021-06-09T12:02:29Z) - Your "Flamingo" is My "Bird": Fine-Grained, or Not [60.25769809922673]
We investigate how to tailor for different fine-grained definitions under divergent levels of expertise.
We first conduct a comprehensive human study where we confirm that most participants prefer multi-granularity labels.
We then discover the key intuition that: coarse-level label prediction exacerbates fine-grained feature learning.
arXiv Detail & Related papers (2020-11-18T02:24:54Z) - ALICE: Active Learning with Contrastive Natural Language Explanations [69.03658685761538]
We propose Active Learning with Contrastive Explanations (ALICE) to improve data efficiency in learning.
ALICE learns to first use active learning to select the most informative pairs of label classes to elicit contrastive natural language explanations.
It extracts knowledge from these explanations using a semantically extracted knowledge.
arXiv Detail & Related papers (2020-09-22T01:02:07Z) - Feathers dataset for Fine-Grained Visual Categorization [0.0]
FeatherV1 is the first publicly available bird's plumage dataset for machine learning.
It can raise interest for a new task in fine-grained visual recognition domain.
arXiv Detail & Related papers (2020-04-18T12:40:43Z) - Transferring Dense Pose to Proximal Animal Classes [83.84439508978126]
We show that it is possible to transfer the knowledge existing in dense pose recognition for humans, as well as in more general object detectors and segmenters, to the problem of dense pose recognition in other classes.
We do this by establishing a DensePose model for the new animal which is also geometrically aligned to humans.
We also introduce two benchmark datasets labelled in the manner of DensePose for the class chimpanzee and use them to evaluate our approach.
arXiv Detail & Related papers (2020-02-28T21:43:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.