Self-Supervised Pretraining and Controlled Augmentation Improve Rare
Wildlife Recognition in UAV Images
- URL: http://arxiv.org/abs/2108.07582v1
- Date: Tue, 17 Aug 2021 12:14:28 GMT
- Title: Self-Supervised Pretraining and Controlled Augmentation Improve Rare
Wildlife Recognition in UAV Images
- Authors: Xiaochen Zheng and Benjamin Kellenberger and Rui Gong and Irena
Hajnsek and Devis Tuia
- Abstract summary: We present a methodology to reduce the amount of required training data by resorting to self-supervised pretraining.
We show that a combination of MoCo, CLD, and geometric augmentations outperforms conventional models pre-trained on ImageNet by a large margin.
- Score: 9.220908533011068
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automated animal censuses with aerial imagery are a vital ingredient towards
wildlife conservation. Recent models are generally based on deep learning and
thus require vast amounts of training data. Due to their scarcity and minuscule
size, annotating animals in aerial imagery is a highly tedious process. In this
project, we present a methodology to reduce the amount of required training
data by resorting to self-supervised pretraining. In detail, we examine a
combination of recent contrastive learning methodologies like Momentum Contrast
(MoCo) and Cross-Level Instance-Group Discrimination (CLD) to condition our
model on the aerial images without the requirement for labels. We show that a
combination of MoCo, CLD, and geometric augmentations outperforms conventional
models pre-trained on ImageNet by a large margin. Crucially, our method still
yields favorable results even if we reduce the number of training animals to
just 10%, at which point our best model scores double the recall of the
baseline at similar precision. This effectively allows reducing the number of
required annotations to a fraction while still being able to train
high-accuracy models in such highly challenging settings.
Related papers
- Multimodal Foundation Models for Zero-shot Animal Species Recognition in
Camera Trap Images [57.96659470133514]
Motion-activated camera traps constitute an efficient tool for tracking and monitoring wildlife populations across the globe.
Supervised learning techniques have been successfully deployed to analyze such imagery, however training such techniques requires annotations from experts.
Reducing the reliance on costly labelled data has immense potential in developing large-scale wildlife tracking solutions with markedly less human labor.
arXiv Detail & Related papers (2023-11-02T08:32:00Z) - No Data Augmentation? Alternative Regularizations for Effective Training
on Small Datasets [0.0]
We study alternative regularization strategies to push the limits of supervised learning on small image classification datasets.
In particular, we employ a agnostic to select (semi) optimal learning rate and weight decay couples via the norm of model parameters.
We reach a test accuracy of 66.5%, on par with the best state-of-the-art methods.
arXiv Detail & Related papers (2023-09-04T16:13:59Z) - The effectiveness of MAE pre-pretraining for billion-scale pretraining [65.98338857597935]
We introduce an additional pre-pretraining stage that is simple and uses the self-supervised MAE technique to initialize the model.
We measure the effectiveness of pre-pretraining on 10 different visual recognition tasks spanning image classification, video recognition, object detection, low-shot classification and zero-shot recognition.
arXiv Detail & Related papers (2023-03-23T17:56:12Z) - Rare Wildlife Recognition with Self-Supervised Representation Learning [0.0]
We present a methodology to reduce the amount of required training data by resorting to self-supervised pretraining.
We show that a combination of MoCo, CLD, and geometric augmentations outperforms conventional models pretrained on ImageNet by a large margin.
arXiv Detail & Related papers (2022-10-29T17:57:38Z) - Bag of Tricks for Long-Tail Visual Recognition of Animal Species in
Camera Trap Images [2.294014185517203]
We evaluate recently proposed techniques to address the long-tail visual recognition of animal species in camera trap images.
In general, the square-root sampling was the method that most improved the performance for minority classes by around 10%.
The proposed approach achieved the best trade-off between the performance of the tail class and the cost of the head classes' accuracy.
arXiv Detail & Related papers (2022-06-24T18:30:26Z) - On Data Scaling in Masked Image Modeling [36.00347416479826]
Masked image modeling (MIM) is suspected to be unable to benefit from larger data.
Data scales ranging from 10% of ImageNet-1K to full ImageNet-22K, model sizes ranging from 49 million to 1 billion, and training lengths ranging from 125K iterations to 500K iterations.
validation loss in pre-training is a good indicator to measure how well the model performs for fine-tuning on multiple tasks.
arXiv Detail & Related papers (2022-06-09T17:58:24Z) - Ensembling Off-the-shelf Models for GAN Training [55.34705213104182]
We find that pretrained computer vision models can significantly improve performance when used in an ensemble of discriminators.
We propose an effective selection mechanism, by probing the linear separability between real and fake samples in pretrained model embeddings.
Our method can improve GAN training in both limited data and large-scale settings.
arXiv Detail & Related papers (2021-12-16T18:59:50Z) - Zoo-Tuning: Adaptive Transfer from a Zoo of Models [82.9120546160422]
Zoo-Tuning learns to adaptively transfer the parameters of pretrained models to the target task.
We evaluate our approach on a variety of tasks, including reinforcement learning, image classification, and facial landmark detection.
arXiv Detail & Related papers (2021-06-29T14:09:45Z) - Self-Damaging Contrastive Learning [92.34124578823977]
Unlabeled data in reality is commonly imbalanced and shows a long-tail distribution.
This paper proposes a principled framework called Self-Damaging Contrastive Learning to automatically balance the representation learning without knowing the classes.
Our experiments show that SDCLR significantly improves not only overall accuracies but also balancedness.
arXiv Detail & Related papers (2021-06-06T00:04:49Z) - Deep learning with self-supervision and uncertainty regularization to
count fish in underwater images [28.261323753321328]
Effective conservation actions require effective population monitoring.
Monitoring populations through image sampling has made data collection cheaper, wide-reaching and less intrusive.
Counting animals from such data is challenging, particularly when densely packed in noisy images.
Deep learning is the state-of-the-art method for many computer vision tasks, but it has yet to be properly explored to count animals.
arXiv Detail & Related papers (2021-04-30T13:02:19Z) - Background Splitting: Finding Rare Classes in a Sea of Background [55.03789745276442]
We focus on the real-world problem of training accurate deep models for image classification of a small number of rare categories.
In these scenarios, almost all images belong to the background category in the dataset (>95% of the dataset is background)
We demonstrate that both standard fine-tuning approaches and state-of-the-art approaches for training on imbalanced datasets do not produce accurate deep models in the presence of this extreme imbalance.
arXiv Detail & Related papers (2020-08-28T23:05:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.