Mastering Large Scale Multi-label Image Recognition with high efficiency
overCamera trap images
- URL: http://arxiv.org/abs/2008.07828v1
- Date: Tue, 18 Aug 2020 09:51:34 GMT
- Title: Mastering Large Scale Multi-label Image Recognition with high efficiency
overCamera trap images
- Authors: Miroslav Valan and Luk\'a\v{s} Picek
- Abstract summary: We are proposing an easy, accessible, light-weight, fast and efficient approach based on our winning submission to the "Hakuna Ma-data - Serengeti Wildlife Identification challenge"
Our system achieved an Accuracy of 97% and outperformed the human level performance.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Camera traps are crucial in biodiversity motivated studies, however dealing
with large number of images while annotating these data sets is a tedious and
time consuming task. To speed up this process, Machine Learning approaches are
a reasonable asset. In this article we are proposing an easy, accessible,
light-weight, fast and efficient approach based on our winning submission to
the "Hakuna Ma-data - Serengeti Wildlife Identification challenge". Our system
achieved an Accuracy of 97% and outperformed the human level performance. We
show that, given relatively large data sets, it is effective to look at each
image only once with little or no augmentation. By utilizing such a simple, yet
effective baseline we were able to avoid over-fitting without extensive
regularization techniques and to train a top scoring system on a very limited
hardware featuring single GPU (1080Ti) despite the large training set (6.7M
images and 6TB).
Related papers
- SeiT: Storage-Efficient Vision Training with Tokens Using 1% of Pixel
Storage [52.317406324182215]
We propose a storage-efficient training strategy for vision classifiers for large-scale datasets.
Our token storage only needs 1% of the original JPEG-compressed raw pixels.
Our experimental results on ImageNet-1k show that our method significantly outperforms other storage-efficient training methods with a large gap.
arXiv Detail & Related papers (2023-03-20T13:55:35Z) - Bag of Tricks for Long-Tail Visual Recognition of Animal Species in
Camera Trap Images [2.294014185517203]
We evaluate recently proposed techniques to address the long-tail visual recognition of animal species in camera trap images.
In general, the square-root sampling was the method that most improved the performance for minority classes by around 10%.
The proposed approach achieved the best trade-off between the performance of the tail class and the cost of the head classes' accuracy.
arXiv Detail & Related papers (2022-06-24T18:30:26Z) - Pure Noise to the Rescue of Insufficient Data: Improving Imbalanced
Classification by Training on Random Noise Images [12.91269560135337]
We present a surprisingly simple yet highly effective method to mitigate this limitation.
Unlike the common use of additive noise or adversarial noise for data augmentation, we propose directly training on pure random noise images.
We present a new Distribution-Aware Routing Batch Normalization layer (DAR-BN), which enables training on pure noise images in addition to natural images within the same network.
arXiv Detail & Related papers (2021-12-16T11:51:35Z) - Bridging the Gap between Events and Frames through Unsupervised Domain
Adaptation [57.22705137545853]
We propose a task transfer method that allows models to be trained directly with labeled images and unlabeled event data.
We leverage the generative event model to split event features into content and motion features.
Our approach unlocks the vast amount of existing image datasets for the training of event-based neural networks.
arXiv Detail & Related papers (2021-09-06T17:31:37Z) - Memory Efficient Meta-Learning with Large Images [62.70515410249566]
Meta learning approaches to few-shot classification are computationally efficient at test time requiring just a few optimization steps or single forward pass to learn a new task.
This limitation arises because a task's entire support set, which can contain up to 1000 images, must be processed before an optimization step can be taken.
We propose LITE, a general and memory efficient episodic training scheme that enables meta-training on large tasks composed of large images on a single GPU.
arXiv Detail & Related papers (2021-07-02T14:37:13Z) - Few-Shot Learning with Part Discovery and Augmentation from Unlabeled
Images [79.34600869202373]
We show that inductive bias can be learned from a flat collection of unlabeled images, and instantiated as transferable representations among seen and unseen classes.
Specifically, we propose a novel part-based self-supervised representation learning scheme to learn transferable representations.
Our method yields impressive results, outperforming the previous best unsupervised methods by 7.74% and 9.24%.
arXiv Detail & Related papers (2021-05-25T12:22:11Z) - Space-Time Crop & Attend: Improving Cross-modal Video Representation
Learning [88.71867887257274]
We show that spatial augmentations such as cropping work well for videos too, but that previous implementations could not do this at a scale sufficient for it to work well.
To address this issue, we first introduce Feature Crop, a method to simulate such augmentations much more efficiently directly in feature space.
Second, we show that as opposed to naive average pooling, the use of transformer-based attention performance improves significantly.
arXiv Detail & Related papers (2021-03-18T12:32:24Z) - Machine learning with limited data [1.2183405753834562]
We study few shot image classification, in which we only have very few labeled data.
One method is to augment image features by mixing the style of these images.
The second method is applying spatial attention to explore the relations between patches of images.
arXiv Detail & Related papers (2021-01-18T17:10:39Z) - One of these (Few) Things is Not Like the Others [0.0]
We propose a model which can both classify new images based on a small number of examples and recognize images which do not belong to any previously seen group.
We evaluate performance over a spectrum of model architectures, including setups small enough to be run on low powered devices.
arXiv Detail & Related papers (2020-05-22T21:49:35Z) - Naive-Student: Leveraging Semi-Supervised Learning in Video Sequences
for Urban Scene Segmentation [57.68890534164427]
In this work, we ask if we may leverage semi-supervised learning in unlabeled video sequences and extra images to improve the performance on urban scene segmentation.
We simply predict pseudo-labels for the unlabeled data and train subsequent models with both human-annotated and pseudo-labeled data.
Our Naive-Student model, trained with such simple yet effective iterative semi-supervised learning, attains state-of-the-art results at all three Cityscapes benchmarks.
arXiv Detail & Related papers (2020-05-20T18:00:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.