HDF: Hybrid Deep Features for Scene Image Representation
- URL: http://arxiv.org/abs/2003.09773v1
- Date: Sun, 22 Mar 2020 01:05:08 GMT
- Title: HDF: Hybrid Deep Features for Scene Image Representation
- Authors: Chiranjibi Sitaula and Yong Xiang and Anish Basnet and Sunil Aryal and
Xuequan Lu
- Abstract summary: We propose a novel type of features -- hybrid deep features, for scene images.
We exploit both object-based and scene-based features at two levels.
We show that our introduced features can produce state-of-the-art classification accuracies.
- Score: 16.252523139552174
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Nowadays it is prevalent to take features extracted from pre-trained deep
learning models as image representations which have achieved promising
classification performance. Existing methods usually consider either
object-based features or scene-based features only. However, both types of
features are important for complex images like scene images, as they can
complement each other. In this paper, we propose a novel type of features --
hybrid deep features, for scene images. Specifically, we exploit both
object-based and scene-based features at two levels: part image level (i.e.,
parts of an image) and whole image level (i.e., a whole image), which produces
a total number of four types of deep features. Regarding the part image level,
we also propose two new slicing techniques to extract part based features.
Finally, we aggregate these four types of deep features via the concatenation
operator. We demonstrate the effectiveness of our hybrid deep features on three
commonly used scene datasets (MIT-67, Scene-15, and Event-8), in terms of the
scene image classification task. Extensive comparisons show that our introduced
features can produce state-of-the-art classification accuracies which are more
consistent and stable than the results of existing features across all
datasets.
Related papers
- Choose What You Need: Disentangled Representation Learning for Scene Text Recognition, Removal and Editing [47.421888361871254]
Scene text images contain not only style information (font, background) but also content information (character, texture)
Previous representation learning methods use tightly coupled features for all tasks, resulting in sub-optimal performance.
We propose a Disentangled Representation Learning framework (DARLING) aimed at disentangling these two types of features for improved adaptability.
arXiv Detail & Related papers (2024-05-07T15:00:11Z) - Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model [80.61157097223058]
A prevalent strategy to bolster image classification performance is through augmenting the training set with synthetic images generated by T2I models.
In this study, we scrutinize the shortcomings of both current generative and conventional data augmentation techniques.
We introduce an innovative inter-class data augmentation method known as Diff-Mix, which enriches the dataset by performing image translations between classes.
arXiv Detail & Related papers (2024-03-28T17:23:45Z) - Self-attention on Multi-Shifted Windows for Scene Segmentation [14.47974086177051]
We explore the effective use of self-attention within multi-scale image windows to learn descriptive visual features.
We propose three different strategies to aggregate these feature maps to decode the feature representation for dense prediction.
Our models achieve very promising performance on four public scene segmentation datasets.
arXiv Detail & Related papers (2022-07-10T07:36:36Z) - Knowledge Mining with Scene Text for Fine-Grained Recognition [53.74297368412834]
We propose an end-to-end trainable network that mines implicit contextual knowledge behind scene text image.
We employ KnowBert to retrieve relevant knowledge for semantic representation and combine it with image features for fine-grained classification.
Our method outperforms the state-of-the-art by 3.72% mAP and 5.39% mAP, respectively.
arXiv Detail & Related papers (2022-03-27T05:54:00Z) - Bridging Composite and Real: Towards End-to-end Deep Image Matting [88.79857806542006]
We study the roles of semantics and details for image matting.
We propose a novel Glance and Focus Matting network (GFM), which employs a shared encoder and two separate decoders.
Comprehensive empirical studies have demonstrated that GFM outperforms state-of-the-art methods.
arXiv Detail & Related papers (2020-10-30T10:57:13Z) - Saliency-driven Class Impressions for Feature Visualization of Deep
Neural Networks [55.11806035788036]
It is advantageous to visualize the features considered to be essential for classification.
Existing visualization methods develop high confidence images consisting of both background and foreground features.
In this work, we propose a saliency-driven approach to visualize discriminative features that are considered most important for a given task.
arXiv Detail & Related papers (2020-07-31T06:11:06Z) - Learning to Compose Hypercolumns for Visual Correspondence [57.93635236871264]
We introduce a novel approach to visual correspondence that dynamically composes effective features by leveraging relevant layers conditioned on the images to match.
The proposed method, dubbed Dynamic Hyperpixel Flow, learns to compose hypercolumn features on the fly by selecting a small number of relevant layers from a deep convolutional neural network.
arXiv Detail & Related papers (2020-07-21T04:03:22Z) - Content and Context Features for Scene Image Representation [16.252523139552174]
We propose new techniques to compute content features and context features, and then fuse them together.
For content features, we design multi-scale deep features based on background and foreground information in images.
For context features, we use annotations of similar images available in the web to design a filter words (codebook)
arXiv Detail & Related papers (2020-06-05T03:19:13Z) - Scene Image Representation by Foreground, Background and Hybrid Features [17.754713956659188]
We propose to use hybrid features in addition to foreground and background features to represent scene images.
Our method produces the state-of-the-art classification performance.
arXiv Detail & Related papers (2020-06-05T01:55:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.