Hallucinating Saliency Maps for Fine-Grained Image Classification for
Limited Data Domains
- URL: http://arxiv.org/abs/2007.12562v3
- Date: Wed, 3 Feb 2021 10:29:57 GMT
- Title: Hallucinating Saliency Maps for Fine-Grained Image Classification for
Limited Data Domains
- Authors: Carola Figueroa-Flores, Bogdan Raducanu, David Berga, and Joost van de
Weijer
- Abstract summary: We propose an approach which does not require explicit saliency maps to improve image classification.
We show that our approach obtains similar results as the case when the saliency maps are provided explicitely.
In addition, we show that our saliency estimation method, which is trained without any saliency groundtruth data, obtains competitive results on real image saliency benchmark (Toronto)
- Score: 27.91871214060683
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Most of the saliency methods are evaluated on their ability to generate
saliency maps, and not on their functionality in a complete vision pipeline,
like for instance, image classification. In the current paper, we propose an
approach which does not require explicit saliency maps to improve image
classification, but they are learned implicitely, during the training of an
end-to-end image classification task. We show that our approach obtains similar
results as the case when the saliency maps are provided explicitely. Combining
RGB data with saliency maps represents a significant advantage for object
recognition, especially for the case when training data is limited. We validate
our method on several datasets for fine-grained classification tasks (Flowers,
Birds and Cars). In addition, we show that our saliency estimation method,
which is trained without any saliency groundtruth data, obtains competitive
results on real image saliency benchmark (Toronto), and outperforms deep
saliency models with synthetic images (SID4VAM).
Related papers
- Fine-grained Recognition with Learnable Semantic Data Augmentation [68.48892326854494]
Fine-grained image recognition is a longstanding computer vision challenge.
We propose diversifying the training data at the feature-level to alleviate the discriminative region loss problem.
Our method significantly improves the generalization performance on several popular classification networks.
arXiv Detail & Related papers (2023-09-01T11:15:50Z) - CSP: Self-Supervised Contrastive Spatial Pre-Training for
Geospatial-Visual Representations [90.50864830038202]
We present Contrastive Spatial Pre-Training (CSP), a self-supervised learning framework for geo-tagged images.
We use a dual-encoder to separately encode the images and their corresponding geo-locations, and use contrastive objectives to learn effective location representations from images.
CSP significantly boosts the model performance with 10-34% relative improvement with various labeled training data sampling ratios.
arXiv Detail & Related papers (2023-05-01T23:11:18Z) - Saliency for free: Saliency prediction as a side-effect of object
recognition [4.609056834401648]
We show that saliency maps can be generated as a side-effect of training an object recognition deep neural network.
Such a network does not require any ground-truth saliency maps for training.
Extensive experiments carried out on both real and synthetic saliency datasets demonstrate that our approach is able to generate accurate saliency maps.
arXiv Detail & Related papers (2021-07-20T17:17:28Z) - Topological Semantic Mapping by Consolidation of Deep Visual Features [0.0]
This work introduces a topological semantic mapping method that uses deep visual features extracted by a CNN, the GoogLeNet, from 2D images captured in multiple views of the environment as the robot operates.
The experiments, performed using a real-world indoor dataset, showed that the method is able to consolidate the visual features of regions and use them to recognize objects and place categories as semantic properties.
arXiv Detail & Related papers (2021-06-24T01:10:03Z) - Grafit: Learning fine-grained image representations with coarse labels [114.17782143848315]
This paper tackles the problem of learning a finer representation than the one provided by training labels.
By jointly leveraging the coarse labels and the underlying fine-grained latent space, it significantly improves the accuracy of category-level retrieval methods.
arXiv Detail & Related papers (2020-11-25T19:06:26Z) - Texture image classification based on a pseudo-parabolic diffusion model [0.0]
The proposed approach is tested on the classification of well established benchmark texture databases and on a practical task of plant species recognition.
The good performance can be justified to a large extent by the ability of the pseudo-parabolic operator to smooth possibly noisy details inside homogeneous regions of the image.
arXiv Detail & Related papers (2020-11-14T00:04:07Z) - Region Comparison Network for Interpretable Few-shot Image
Classification [97.97902360117368]
Few-shot image classification has been proposed to effectively use only a limited number of labeled examples to train models for new classes.
We propose a metric learning based method named Region Comparison Network (RCN), which is able to reveal how few-shot learning works.
We also present a new way to generalize the interpretability from the level of tasks to categories.
arXiv Detail & Related papers (2020-09-08T07:29:05Z) - Background Splitting: Finding Rare Classes in a Sea of Background [55.03789745276442]
We focus on the real-world problem of training accurate deep models for image classification of a small number of rare categories.
In these scenarios, almost all images belong to the background category in the dataset (>95% of the dataset is background)
We demonstrate that both standard fine-tuning approaches and state-of-the-art approaches for training on imbalanced datasets do not produce accurate deep models in the presence of this extreme imbalance.
arXiv Detail & Related papers (2020-08-28T23:05:15Z) - SCAN: Learning to Classify Images without Labels [73.69513783788622]
We advocate a two-step approach where feature learning and clustering are decoupled.
A self-supervised task from representation learning is employed to obtain semantically meaningful features.
We obtain promising results on ImageNet, and outperform several semi-supervised learning methods in the low-data regime.
arXiv Detail & Related papers (2020-05-25T18:12:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.