Object-to-Scene: Learning to Transfer Object Knowledge to Indoor Scene
Recognition
- URL: http://arxiv.org/abs/2108.00399v1
- Date: Sun, 1 Aug 2021 08:37:08 GMT
- Title: Object-to-Scene: Learning to Transfer Object Knowledge to Indoor Scene
Recognition
- Authors: Bo Miao, Liguang Zhou, Ajmal Mian, Tin Lun Lam, Yangsheng Xu
- Abstract summary: We propose an Object-to-Scene (OTS) method, which extracts object features and learns object relations to recognize indoor scenes.
OTS outperforms the state-of-the-art methods by more than 2% on indoor scene recognition without using any additional streams.
- Score: 19.503027767462605
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Accurate perception of the surrounding scene is helpful for robots to make
reasonable judgments and behaviours. Therefore, developing effective scene
representation and recognition methods are of significant importance in
robotics. Currently, a large body of research focuses on developing novel
auxiliary features and networks to improve indoor scene recognition ability.
However, few of them focus on directly constructing object features and
relations for indoor scene recognition. In this paper, we analyze the
weaknesses of current methods and propose an Object-to-Scene (OTS) method,
which extracts object features and learns object relations to recognize indoor
scenes. The proposed OTS first extracts object features based on the
segmentation network and the proposed object feature aggregation module (OFAM).
Afterwards, the object relations are calculated and the scene representation is
constructed based on the proposed object attention module (OAM) and global
relation aggregation module (GRAM). The final results in this work show that
OTS successfully extracts object features and learns object relations from the
segmentation network. Moreover, OTS outperforms the state-of-the-art methods by
more than 2\% on indoor scene recognition without using any additional streams.
Code is publicly available at: https://github.com/FreeformRobotics/OTS.
Related papers
- Simultaneous Detection and Interaction Reasoning for Object-Centric Action Recognition [21.655278000690686]
We propose an end-to-end object-centric action recognition framework.
It simultaneously performs Detection And Interaction Reasoning in one stage.
We conduct experiments on two datasets, Something-Else and Ikea-Assembly.
arXiv Detail & Related papers (2024-04-18T05:06:12Z) - Localizing Active Objects from Egocentric Vision with Symbolic World
Knowledge [62.981429762309226]
The ability to actively ground task instructions from an egocentric view is crucial for AI agents to accomplish tasks or assist humans virtually.
We propose to improve phrase grounding models' ability on localizing the active objects by: learning the role of objects undergoing change and extracting them accurately from the instructions.
We evaluate our framework on Ego4D and Epic-Kitchens datasets.
arXiv Detail & Related papers (2023-10-23T16:14:05Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - SOS! Self-supervised Learning Over Sets Of Handled Objects In Egocentric
Action Recognition [35.4163266882568]
We introduce Self-Supervised Learning Over Sets (SOS) to pre-train a generic Objects In Contact (OIC) representation model.
Our OIC significantly boosts the performance of multiple state-of-the-art video classification models.
arXiv Detail & Related papers (2022-04-10T23:27:19Z) - Bi-directional Object-context Prioritization Learning for Saliency
Ranking [60.62461793691836]
Existing approaches focus on learning either object-object or object-scene relations.
We observe that spatial attention works concurrently with object-based attention in the human visual recognition system.
We propose a novel bi-directional method to unify spatial attention and object-based attention for saliency ranking.
arXiv Detail & Related papers (2022-03-17T16:16:03Z) - Exploiting Scene Graphs for Human-Object Interaction Detection [81.49184987430333]
Human-Object Interaction (HOI) detection is a fundamental visual task aiming at localizing and recognizing interactions between humans and objects.
We propose a novel method to exploit this information, through the scene graph, for the Human-Object Interaction (SG2HOI) detection task.
Our method, SG2HOI, incorporates the SG information in two ways: (1) we embed a scene graph into a global context clue, serving as the scene-specific environmental context; and (2) we build a relation-aware message-passing module to gather relationships from objects' neighborhood and transfer them into interactions.
arXiv Detail & Related papers (2021-08-19T09:40:50Z) - Learning Open-World Object Proposals without Learning to Classify [110.30191531975804]
We propose a classification-free Object Localization Network (OLN) which estimates the objectness of each region purely by how well the location and shape of a region overlaps with any ground-truth object.
This simple strategy learns generalizable objectness and outperforms existing proposals on cross-category generalization.
arXiv Detail & Related papers (2021-08-15T14:36:02Z) - BORM: Bayesian Object Relation Model for Indoor Scene Recognition [3.3274747298291216]
We propose to utilize meaningful object representations for indoor scene representation.
First, we utilize an improved object model (IOM) as a baseline that enriches the object knowledge by introducing a scene parsing algorithm pretrained on the ADE20K dataset with rich object categories related to the indoor scene.
To analyze the object co-occurrences and pairwise object relations, we formulate the IOM from a Bayesian perspective as the Bayesian object relation model (BORM)
arXiv Detail & Related papers (2021-08-01T08:31:18Z) - A Deep Learning Approach to Object Affordance Segmentation [31.221897360610114]
We design an autoencoder that infers pixel-wise affordance labels in both videos and static images.
Our model surpasses the need for object labels and bounding boxes by using a soft-attention mechanism.
We show that our model achieves competitive results compared to strongly supervised methods on SOR3D-AFF.
arXiv Detail & Related papers (2020-04-18T15:34:41Z) - Look-into-Object: Self-supervised Structure Modeling for Object
Recognition [71.68524003173219]
We propose to "look into object" (explicitly yet intrinsically model the object structure) through incorporating self-supervisions.
We show the recognition backbone can be substantially enhanced for more robust representation learning.
Our approach achieves large performance gain on a number of benchmarks, including generic object recognition (ImageNet) and fine-grained object recognition tasks (CUB, Cars, Aircraft)
arXiv Detail & Related papers (2020-03-31T12:22:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.