Bag of Tricks for Long-Tail Visual Recognition of Animal Species in
Camera Trap Images
- URL: http://arxiv.org/abs/2206.12458v1
- Date: Fri, 24 Jun 2022 18:30:26 GMT
- Title: Bag of Tricks for Long-Tail Visual Recognition of Animal Species in
Camera Trap Images
- Authors: Fagner Cunha, Eulanda M. dos Santos, Juan G. Colonna
- Abstract summary: We evaluate recently proposed techniques to address the long-tail visual recognition of animal species in camera trap images.
In general, the square-root sampling was the method that most improved the performance for minority classes by around 10%.
The proposed approach achieved the best trade-off between the performance of the tail class and the cost of the head classes' accuracy.
- Score: 2.294014185517203
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Camera traps are a strategy for monitoring wildlife that collects a large
number of pictures. The number of images collected from each species usually
follows a long-tail distribution, i.e., a few classes have a large number of
instances while a lot of species have just a small percentage. Although in most
cases these rare species are the classes of interest to ecologists, they are
often neglected when using deep learning models because these models require a
large number of images for the training. In this work, we systematically
evaluate recently proposed techniques - namely, square-root re-sampling,
class-balanced focal loss, and balanced group softmax - to address the
long-tail visual recognition of animal species in camera trap images. To
achieve a more general conclusion, we evaluated the selected methods on four
families of computer vision models (ResNet, MobileNetV3, EfficientNetV2, and
Swin Transformer) and four camera trap datasets with different characteristics.
Initially, we prepared a robust baseline with the most recent training tricks
and then we applied the methods for improving long-tail recognition. Our
experiments show that the Swin transformer can reach high performance for rare
classes without applying any additional method for handling imbalance, with an
overall accuracy of 88.76% for WCS dataset and 94.97% for Snapshot Serengeti,
considering a location-based train/test split. In general, the square-root
sampling was the method that most improved the performance for minority classes
by around 10%, but at the cost of reducing the majority classes accuracy at
least 4%. These results motivated us to propose a simple and effective approach
using an ensemble combining square-root sampling and the baseline. The proposed
approach achieved the best trade-off between the performance of the tail class
and the cost of the head classes' accuracy.
Related papers
- Multimodal Foundation Models for Zero-shot Animal Species Recognition in
Camera Trap Images [57.96659470133514]
Motion-activated camera traps constitute an efficient tool for tracking and monitoring wildlife populations across the globe.
Supervised learning techniques have been successfully deployed to analyze such imagery, however training such techniques requires annotations from experts.
Reducing the reliance on costly labelled data has immense potential in developing large-scale wildlife tracking solutions with markedly less human labor.
arXiv Detail & Related papers (2023-11-02T08:32:00Z) - PrototypeFormer: Learning to Explore Prototype Relationships for
Few-shot Image Classification [19.93681871684493]
We propose our method called PrototypeFormer, which aims to significantly advance traditional few-shot image classification approaches.
We utilize a transformer architecture to build a prototype extraction module, aiming to extract class representations that are more discriminative for few-shot classification.
Despite its simplicity, the method performs remarkably well, with no bells and whistles.
arXiv Detail & Related papers (2023-10-05T12:56:34Z) - LCReg: Long-Tailed Image Classification with Latent Categories based
Recognition [81.5551335554507]
We propose the Latent Categories based long-tail Recognition (LCReg) method.
Our hypothesis is that common latent features shared by head and tail classes can be used to improve feature representation.
Specifically, we learn a set of class-agnostic latent features shared by both head and tail classes, and then use semantic data augmentation on the latent features to implicitly increase the diversity of the training sample.
arXiv Detail & Related papers (2023-09-13T02:03:17Z) - Rare Wildlife Recognition with Self-Supervised Representation Learning [0.0]
We present a methodology to reduce the amount of required training data by resorting to self-supervised pretraining.
We show that a combination of MoCo, CLD, and geometric augmentations outperforms conventional models pretrained on ImageNet by a large margin.
arXiv Detail & Related papers (2022-10-29T17:57:38Z) - CropMix: Sampling a Rich Input Distribution via Multi-Scale Cropping [97.05377757299672]
We present a simple method, CropMix, for producing a rich input distribution from the original dataset distribution.
CropMix can be seamlessly applied to virtually any training recipe and neural network architecture performing classification tasks.
We show that CropMix is of benefit to both contrastive learning and masked image modeling towards more powerful representations.
arXiv Detail & Related papers (2022-05-31T16:57:28Z) - Two-phase training mitigates class imbalance for camera trap image
classification with CNNs [17.905795249216805]
We use two-phase training to increase the performance for minority classes.
We find that two-phase training based on majority undersampling increases class-specific F1-scores up to 3.0%.
We also find that two-phase training outperforms using only oversampling or undersampling by 6.1% in F1-score on average.
arXiv Detail & Related papers (2021-12-29T10:47:45Z) - Self-Supervised Pretraining and Controlled Augmentation Improve Rare
Wildlife Recognition in UAV Images [9.220908533011068]
We present a methodology to reduce the amount of required training data by resorting to self-supervised pretraining.
We show that a combination of MoCo, CLD, and geometric augmentations outperforms conventional models pre-trained on ImageNet by a large margin.
arXiv Detail & Related papers (2021-08-17T12:14:28Z) - Few-Shot Learning with Part Discovery and Augmentation from Unlabeled
Images [79.34600869202373]
We show that inductive bias can be learned from a flat collection of unlabeled images, and instantiated as transferable representations among seen and unseen classes.
Specifically, we propose a novel part-based self-supervised representation learning scheme to learn transferable representations.
Our method yields impressive results, outperforming the previous best unsupervised methods by 7.74% and 9.24%.
arXiv Detail & Related papers (2021-05-25T12:22:11Z) - ResLT: Residual Learning for Long-tailed Recognition [64.19728932445523]
We propose a more fundamental perspective for long-tailed recognition, i.e., from the aspect of parameter space.
We design the effective residual fusion mechanism -- with one main branch optimized to recognize images from all classes, another two residual branches are gradually fused and optimized to enhance images from medium+tail classes and tail classes respectively.
We test our method on several benchmarks, i.e., long-tailed version of CIFAR-10, CIFAR-100, Places, ImageNet, and iNaturalist 2018.
arXiv Detail & Related papers (2021-01-26T08:43:50Z) - Two-View Fine-grained Classification of Plant Species [66.75915278733197]
We propose a novel method based on a two-view leaf image representation and a hierarchical classification strategy for fine-grained recognition of plant species.
A deep metric based on Siamese convolutional neural networks is used to reduce the dependence on a large number of training samples and make the method scalable to new plant species.
arXiv Detail & Related papers (2020-05-18T21:57:47Z) - Automatic Detection and Recognition of Individuals in Patterned Species [4.163860911052052]
We develop a framework for automatic detection and recognition of individuals in different patterned species.
We use the recently proposed Faster-RCNN object detection framework to efficiently detect animals in images.
We evaluate our recognition system on zebra and jaguar images to show generalization to other patterned species.
arXiv Detail & Related papers (2020-05-06T15:29:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.