BORM: Bayesian Object Relation Model for Indoor Scene Recognition
- URL: http://arxiv.org/abs/2108.00397v1
- Date: Sun, 1 Aug 2021 08:31:18 GMT
- Title: BORM: Bayesian Object Relation Model for Indoor Scene Recognition
- Authors: Liguang Zhou, Jun Cen, Xingchao Wang, Zhenglong Sun, Tin Lun Lam,
Yangsheng Xu
- Abstract summary: We propose to utilize meaningful object representations for indoor scene representation.
First, we utilize an improved object model (IOM) as a baseline that enriches the object knowledge by introducing a scene parsing algorithm pretrained on the ADE20K dataset with rich object categories related to the indoor scene.
To analyze the object co-occurrences and pairwise object relations, we formulate the IOM from a Bayesian perspective as the Bayesian object relation model (BORM)
- Score: 3.3274747298291216
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Scene recognition is a fundamental task in robotic perception. For human
beings, scene recognition is reasonable because they have abundant object
knowledge of the real world. The idea of transferring prior object knowledge
from humans to scene recognition is significant but still less exploited. In
this paper, we propose to utilize meaningful object representations for indoor
scene representation. First, we utilize an improved object model (IOM) as a
baseline that enriches the object knowledge by introducing a scene parsing
algorithm pretrained on the ADE20K dataset with rich object categories related
to the indoor scene. To analyze the object co-occurrences and pairwise object
relations, we formulate the IOM from a Bayesian perspective as the Bayesian
object relation model (BORM). Meanwhile, we incorporate the proposed BORM with
the PlacesCNN model as the combined Bayesian object relation model (CBORM) for
scene recognition and significantly outperforms the state-of-the-art methods on
the reduced Places365 dataset, and SUN RGB-D dataset without retraining,
showing the excellent generalization ability of the proposed method. Code can
be found at https://github.com/hszhoushen/borm.
Related papers
- Interpretable Action Recognition on Hard to Classify Actions [11.641926922266347]
Humans recognise complex activities in video by recognising critical-temporal relations among explicitly recognised objects and parts.
To mimic this we build on a model which uses positions of objects and hands, and their motions, to recognise the activity taking place.
To improve this model we focussed on three of the most confused classes (for this model) and identified that the lack of 3D information was the major problem.
A state-of-the-art object detection model was fine-tuned to determine the difference between "Container" and "NotContainer" in order to integrate object shape information into the existing object features.
arXiv Detail & Related papers (2024-09-19T21:23:44Z) - ICGNet: A Unified Approach for Instance-Centric Grasping [42.92991092305974]
We introduce an end-to-end architecture for object-centric grasping.
We show the effectiveness of the proposed method by extensively evaluating it against state-of-the-art methods on synthetic datasets.
arXiv Detail & Related papers (2024-01-18T12:41:41Z) - Localizing Active Objects from Egocentric Vision with Symbolic World
Knowledge [62.981429762309226]
The ability to actively ground task instructions from an egocentric view is crucial for AI agents to accomplish tasks or assist humans virtually.
We propose to improve phrase grounding models' ability on localizing the active objects by: learning the role of objects undergoing change and extracting them accurately from the instructions.
We evaluate our framework on Ego4D and Epic-Kitchens datasets.
arXiv Detail & Related papers (2023-10-23T16:14:05Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - An Object SLAM Framework for Association, Mapping, and High-Level Tasks [12.62957558651032]
We present a comprehensive object SLAM framework that focuses on object-based perception and object-oriented robot tasks.
A range of public datasets and real-world results have been used to evaluate the proposed object SLAM framework for its efficient performance.
arXiv Detail & Related papers (2023-05-12T08:10:14Z) - MegaPose: 6D Pose Estimation of Novel Objects via Render & Compare [84.80956484848505]
MegaPose is a method to estimate the 6D pose of novel objects, that is, objects unseen during training.
We present a 6D pose refiner based on a render&compare strategy which can be applied to novel objects.
Second, we introduce a novel approach for coarse pose estimation which leverages a network trained to classify whether the pose error between a synthetic rendering and an observed image of the same object can be corrected by the refiner.
arXiv Detail & Related papers (2022-12-13T19:30:03Z) - SOS! Self-supervised Learning Over Sets Of Handled Objects In Egocentric
Action Recognition [35.4163266882568]
We introduce Self-Supervised Learning Over Sets (SOS) to pre-train a generic Objects In Contact (OIC) representation model.
Our OIC significantly boosts the performance of multiple state-of-the-art video classification models.
arXiv Detail & Related papers (2022-04-10T23:27:19Z) - Discovering Objects that Can Move [55.743225595012966]
We study the problem of object discovery -- separating objects from the background without manual labels.
Existing approaches utilize appearance cues, such as color, texture, and location, to group pixels into object-like regions.
We choose to focus on dynamic objects -- entities that can move independently in the world.
arXiv Detail & Related papers (2022-03-18T21:13:56Z) - Object Manipulation via Visual Target Localization [64.05939029132394]
Training agents to manipulate objects, poses many challenges.
We propose an approach that explores the environment in search for target objects, computes their 3D coordinates once they are located, and then continues to estimate their 3D locations even when the objects are not visible.
Our evaluations show a massive 3x improvement in success rate over a model that has access to the same sensory suite.
arXiv Detail & Related papers (2022-03-15T17:59:01Z) - Learning Open-World Object Proposals without Learning to Classify [110.30191531975804]
We propose a classification-free Object Localization Network (OLN) which estimates the objectness of each region purely by how well the location and shape of a region overlaps with any ground-truth object.
This simple strategy learns generalizable objectness and outperforms existing proposals on cross-category generalization.
arXiv Detail & Related papers (2021-08-15T14:36:02Z) - Object-to-Scene: Learning to Transfer Object Knowledge to Indoor Scene
Recognition [19.503027767462605]
We propose an Object-to-Scene (OTS) method, which extracts object features and learns object relations to recognize indoor scenes.
OTS outperforms the state-of-the-art methods by more than 2% on indoor scene recognition without using any additional streams.
arXiv Detail & Related papers (2021-08-01T08:37:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.