Scene Recognition with Objectness, Attribute and Category Learning
- URL: http://arxiv.org/abs/2207.10174v1
- Date: Wed, 20 Jul 2022 19:51:54 GMT
- Title: Scene Recognition with Objectness, Attribute and Category Learning
- Authors: Ji Zhang, Jean-Paul Ainam, Li-hui Zhao, Wenai Song, and Xin Wang
- Abstract summary: Scene classification has established itself as a challenging research problem.
Image recognition serves as a key pillar for the good performance of scene recognition.
We propose a Multi-task Attribute-Scene Recognition network which learns a category embedding and at the same time predicts scene attributes.
- Score: 8.581276116041401
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Scene classification has established itself as a challenging research
problem. Compared to images of individual objects, scene images could be much
more semantically complex and abstract. Their difference mainly lies in the
level of granularity of recognition. Yet, image recognition serves as a key
pillar for the good performance of scene recognition as the knowledge attained
from object images can be used for accurate recognition of scenes. The existing
scene recognition methods only take the category label of the scene into
consideration. However, we find that the contextual information that contains
detailed local descriptions are also beneficial in allowing the scene
recognition model to be more discriminative. In this paper, we aim to improve
scene recognition using attribute and category label information encoded in
objects. Based on the complementarity of attribute and category labels, we
propose a Multi-task Attribute-Scene Recognition (MASR) network which learns a
category embedding and at the same time predicts scene attributes. Attribute
acquisition and object annotation are tedious and time consuming tasks. We
tackle the problem by proposing a partially supervised annotation strategy in
which human intervention is significantly reduced. The strategy provides a much
more cost-effective solution to real world scenarios, and requires considerably
less annotation efforts. Moreover, we re-weight the attribute predictions
considering the level of importance indicated by the object detected scores.
Using the proposed method, we efficiently annotate attribute labels for four
large-scale datasets, and systematically investigate how scene and attribute
recognition benefit from each other. The experimental results demonstrate that
MASR learns a more discriminative representation and achieves competitive
recognition performance compared to the state-of-the-art methods
Related papers
- Learning Scene Context Without Images [2.8184014933789365]
We introduce a novel approach to teach scene contextual knowledge to machines using an attention mechanism.
A distinctive aspect of the proposed approach is its reliance solely on labels from image datasets to teach scene context.
We show how scene-wide relationships among different objects can be learned using a self-attention mechanism.
arXiv Detail & Related papers (2023-11-18T07:27:25Z) - Inter-object Discriminative Graph Modeling for Indoor Scene Recognition [5.712940060321454]
We propose to leverage discriminative object knowledge to enhance scene feature representations.
We construct a Discriminative Graph Network (DGN) in which pixel-level scene features are defined as nodes.
With the proposed IODP and DGN, we obtain state-of-the-art results on several widely used scene datasets.
arXiv Detail & Related papers (2023-11-10T08:07:16Z) - EnTri: Ensemble Learning with Tri-level Representations for Explainable Scene Recognition [27.199124692225777]
Scene recognition based on deep-learning has made significant progress, but there are still limitations in its performance.
We propose EnTri, a framework that employs ensemble learning using a hierarchy of visual features.
EnTri has demonstrated superiority in terms of recognition accuracy, achieving competitive performance compared to state-of-the-art approaches.
arXiv Detail & Related papers (2023-07-23T22:11:23Z) - Learning Dense Object Descriptors from Multiple Views for Low-shot
Category Generalization [27.583517870047487]
We propose Deep Object Patch rimis (DOPE), which can be trained from multiple views of object instances without any category or semantic object part labels.
To train DOPE, we assume access to sparse depths, foreground masks and known cameras, to obtain pixel-level correspondences between views of an object.
We find that DOPE can directly be used for low-shot classification of novel categories using local-part matching, and is competitive with and outperforms supervised and self-supervised learning baselines.
arXiv Detail & Related papers (2022-11-28T04:31:53Z) - Attribute Prototype Network for Any-Shot Learning [113.50220968583353]
We argue that an image representation with integrated attribute localization ability would be beneficial for any-shot, i.e. zero-shot and few-shot, image classification tasks.
We propose a novel representation learning framework that jointly learns global and local features using only class-level attributes.
arXiv Detail & Related papers (2022-04-04T02:25:40Z) - Knowledge Mining with Scene Text for Fine-Grained Recognition [53.74297368412834]
We propose an end-to-end trainable network that mines implicit contextual knowledge behind scene text image.
We employ KnowBert to retrieve relevant knowledge for semantic representation and combine it with image features for fine-grained classification.
Our method outperforms the state-of-the-art by 3.72% mAP and 5.39% mAP, respectively.
arXiv Detail & Related papers (2022-03-27T05:54:00Z) - Contrastive Object Detection Using Knowledge Graph Embeddings [72.17159795485915]
We compare the error statistics of the class embeddings learned from a one-hot approach with semantically structured embeddings from natural language processing or knowledge graphs.
We propose a knowledge-embedded design for keypoint-based and transformer-based object detection architectures.
arXiv Detail & Related papers (2021-12-21T17:10:21Z) - Region-level Active Learning for Cluttered Scenes [60.93811392293329]
We introduce a new strategy that subsumes previous Image-level and Object-level approaches into a generalized, Region-level approach.
We show that this approach significantly decreases labeling effort and improves rare object search on realistic data with inherent class-imbalance and cluttered scenes.
arXiv Detail & Related papers (2021-08-20T14:02:38Z) - Learning Object Detection from Captions via Textual Scene Attributes [70.90708863394902]
We argue that captions contain much richer information about the image, including attributes of objects and their relations.
We present a method that uses the attributes in this "textual scene graph" to train object detectors.
We empirically demonstrate that the resulting model achieves state-of-the-art results on several challenging object detection datasets.
arXiv Detail & Related papers (2020-09-30T10:59:20Z) - CompGuessWhat?!: A Multi-task Evaluation Framework for Grounded Language
Learning [78.3857991931479]
We present GROLLA, an evaluation framework for Grounded Language Learning with Attributes.
We also propose a new dataset CompGuessWhat?! as an instance of this framework for evaluating the quality of learned neural representations.
arXiv Detail & Related papers (2020-06-03T11:21:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.