Content and Context Features for Scene Image Representation
- URL: http://arxiv.org/abs/2006.03217v3
- Date: Sat, 24 Apr 2021 05:37:31 GMT
- Title: Content and Context Features for Scene Image Representation
- Authors: Chiranjibi Sitaula and Sunil Aryal and Yong Xiang and Anish Basnet and
Xuequan Lu
- Abstract summary: We propose new techniques to compute content features and context features, and then fuse them together.
For content features, we design multi-scale deep features based on background and foreground information in images.
For context features, we use annotations of similar images available in the web to design a filter words (codebook)
- Score: 16.252523139552174
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing research in scene image classification has focused on either content
features (e.g., visual information) or context features (e.g., annotations). As
they capture different information about images which can be complementary and
useful to discriminate images of different classes, we suppose the fusion of
them will improve classification results. In this paper, we propose new
techniques to compute content features and context features, and then fuse them
together. For content features, we design multi-scale deep features based on
background and foreground information in images. For context features, we use
annotations of similar images available in the web to design a filter words
(codebook). Our experiments in three widely used benchmark scene datasets using
support vector machine classifier reveal that our proposed context and content
features produce better results than existing context and content features,
respectively. The fusion of the proposed two types of features significantly
outperform numerous state-of-the-art features.
Related papers
- Choose What You Need: Disentangled Representation Learning for Scene Text Recognition, Removal and Editing [47.421888361871254]
Scene text images contain not only style information (font, background) but also content information (character, texture)
Previous representation learning methods use tightly coupled features for all tasks, resulting in sub-optimal performance.
We propose a Disentangled Representation Learning framework (DARLING) aimed at disentangling these two types of features for improved adaptability.
arXiv Detail & Related papers (2024-05-07T15:00:11Z) - Language Guided Domain Generalized Medical Image Segmentation [68.93124785575739]
Single source domain generalization holds promise for more reliable and consistent image segmentation across real-world clinical settings.
We propose an approach that explicitly leverages textual information by incorporating a contrastive learning mechanism guided by the text encoder features.
Our approach achieves favorable performance against existing methods in literature.
arXiv Detail & Related papers (2024-04-01T17:48:15Z) - Leveraging Open-Vocabulary Diffusion to Camouflaged Instance
Segmentation [59.78520153338878]
Text-to-image diffusion techniques have shown exceptional capability of producing high-quality images from text descriptions.
We propose a method built upon a state-of-the-art diffusion model, empowered by open-vocabulary to learn multi-scale textual-visual features for camouflaged object representations.
arXiv Detail & Related papers (2023-12-29T07:59:07Z) - Knowledge Mining with Scene Text for Fine-Grained Recognition [53.74297368412834]
We propose an end-to-end trainable network that mines implicit contextual knowledge behind scene text image.
We employ KnowBert to retrieve relevant knowledge for semantic representation and combine it with image features for fine-grained classification.
Our method outperforms the state-of-the-art by 3.72% mAP and 5.39% mAP, respectively.
arXiv Detail & Related papers (2022-03-27T05:54:00Z) - Matching Visual Features to Hierarchical Semantic Topics for Image
Paragraph Captioning [50.08729005865331]
This paper develops a plug-and-play hierarchical-topic-guided image paragraph generation framework.
To capture the correlations between the image and text at multiple levels of abstraction, we design a variational inference network.
To guide the paragraph generation, the learned hierarchical topics and visual features are integrated into the language model.
arXiv Detail & Related papers (2021-05-10T06:55:39Z) - Semantic Disentangling Generalized Zero-Shot Learning [50.259058462272435]
Generalized Zero-Shot Learning (GZSL) aims to recognize images from both seen and unseen categories.
In this paper, we propose a novel feature disentangling approach based on an encoder-decoder architecture.
The proposed model aims to distill quality semantic-consistent representations that capture intrinsic features of seen images.
arXiv Detail & Related papers (2021-01-20T05:46:21Z) - A Feature Analysis for Multimodal News Retrieval [9.269820020286382]
We consider five feature types for image and text and compare the performance of the retrieval system using different combinations.
Experimental results show that retrieval results can be improved when considering both visual and textual information.
arXiv Detail & Related papers (2020-07-13T14:09:29Z) - Unsupervised segmentation via semantic-apparent feature fusion [21.75371777263847]
This research proposes an unsupervised foreground segmentation method based on semantic-apparent feature fusion (SAFF)
Key regions of foreground object can be accurately responded via semantic features, while apparent features provide richer detailed expression.
By fusing semantic and apparent features, as well as cascading the modules of intra-image adaptive feature weight learning and inter-image common feature learning, the research achieves performance that significantly exceeds baselines.
arXiv Detail & Related papers (2020-05-21T08:28:49Z) - HDF: Hybrid Deep Features for Scene Image Representation [16.252523139552174]
We propose a novel type of features -- hybrid deep features, for scene images.
We exploit both object-based and scene-based features at two levels.
We show that our introduced features can produce state-of-the-art classification accuracies.
arXiv Detail & Related papers (2020-03-22T01:05:08Z) - Fine-grained Image Classification and Retrieval by Combining Visual and
Locally Pooled Textual Features [8.317191999275536]
In particular, the mere presence of text provides strong guiding content that should be employed to tackle a diversity of computer vision tasks.
In this paper, we address the problem of fine-grained classification and image retrieval by leveraging textual information along with visual cues to comprehend the existing intrinsic relation between the two modalities.
arXiv Detail & Related papers (2020-01-14T12:06:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.