Leveraging commonsense for object localisation in partial scenes
        - URL: http://arxiv.org/abs/2211.00562v1
- Date: Tue, 1 Nov 2022 16:17:07 GMT
- Title: Leveraging commonsense for object localisation in partial scenes
- Authors: Francesco Giuliari, Geri Skenderi, Marco Cristani, Alessio Del Bue and
  Yiming Wang
- Abstract summary: We propose a novel scene representation to facilitate the geometric reasoning, Directed Spatial Commonsense Graph (D-SCG)
We estimate the unknown position of the target object using a Graph Neural Network that implements a novel attentional message passing mechanism.
We evaluate our method using Partial ScanNet, improving the state-of-the-art by 5.9% in terms of the localisation accuracy at a 8x faster training speed.
- Score: 36.47035776975184
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   We propose an end-to-end solution to address the problem of object
localisation in partial scenes, where we aim to estimate the position of an
object in an unknown area given only a partial 3D scan of the scene. We propose
a novel scene representation to facilitate the geometric reasoning, Directed
Spatial Commonsense Graph (D-SCG), a spatial scene graph that is enriched with
additional concept nodes from a commonsense knowledge base. Specifically, the
nodes of D-SCG represent the scene objects and the edges are their relative
positions. Each object node is then connected via different commonsense
relationships to a set of concept nodes. With the proposed graph-based scene
representation, we estimate the unknown position of the target object using a
Graph Neural Network that implements a novel attentional message passing
mechanism. The network first predicts the relative positions between the target
object and each visible object by learning a rich representation of the objects
via aggregating both the object nodes and the concept nodes in D-SCG. These
relative positions then are merged to obtain the final position. We evaluate
our method using Partial ScanNet, improving the state-of-the-art by 5.9% in
terms of the localisation accuracy at a 8x faster training speed.
 
      
        Related papers
        - ObjectGS: Object-aware Scene Reconstruction and Scene Understanding via   Gaussian Splatting [54.92763171355442]
 ObjectGS is an object-aware framework that unifies 3D scene reconstruction with semantic understanding.<n>We show through experiments that ObjectGS outperforms state-of-the-art methods on open-vocabulary and panoptic segmentation tasks.
 arXiv  Detail & Related papers  (2025-07-21T10:06:23Z)
- Open-Vocabulary Octree-Graph for 3D Scene Understanding [54.11828083068082]
 Octree-Graph is a novel scene representation for open-vocabulary 3D scene understanding.
An adaptive-octree structure is developed that stores semantics and depicts the occupancy of an object adjustably according to its shape.
 arXiv  Detail & Related papers  (2024-11-25T10:14:10Z)
- Multiview Scene Graph [7.460438046915524]
 A proper scene representation is central to the pursuit of spatial intelligence.
We propose to build Multiview Scene Graphs (MSG) from unposed images.
MSG represents a scene topologically with interconnected place and object nodes.
 arXiv  Detail & Related papers  (2024-10-15T02:04:05Z)
- Inter-object Discriminative Graph Modeling for Indoor Scene Recognition [5.712940060321454]
 We propose to leverage discriminative object knowledge to enhance scene feature representations.
We construct a Discriminative Graph Network (DGN) in which pixel-level scene features are defined as nodes.
With the proposed IODP and DGN, we obtain state-of-the-art results on several widely used scene datasets.
 arXiv  Detail & Related papers  (2023-11-10T08:07:16Z)
- 3DRP-Net: 3D Relative Position-aware Network for 3D Visual Grounding [58.924180772480504]
 3D visual grounding aims to localize the target object in a 3D point cloud by a free-form language description.
We propose a relation-aware one-stage framework, named 3D Relative Position-aware Network (3-Net)
 arXiv  Detail & Related papers  (2023-07-25T09:33:25Z)
- Learning Object Placement via Dual-path Graph Completion [28.346027247882354]
 Object placement aims to place a foreground object over a background image with a suitable location and size.
In this work, we treat object placement as a graph completion problem and propose a novel graph completion module (GCM)
The foreground object is encoded as a special node that should be inserted at a reasonable place in this graph.
 arXiv  Detail & Related papers  (2022-07-23T08:39:39Z)
- Spatial Commonsense Graph for Object Localisation in Partial Scenes [36.47035776975184]
 We solve object localisation in partial scenes, a new problem of estimating the unknown position of an object given a partial 3D scan of a scene.
The proposed solution is based on a novel scene graph model, the Spatial Commonsense Graph (SCG), where objects are the nodes and edges define pairwise distances between them.
The SCG is used to estimate the unknown position of the target object in two steps: first, we feed the SCG into a novel Proximity Prediction Network, a graph neural network that uses attention to perform distance prediction between the node representing the target object and the nodes representing the observed objects in the
 arXiv  Detail & Related papers  (2022-03-10T14:13:35Z)
- SIRI: Spatial Relation Induced Network For Spatial Description
  Resolution [64.38872296406211]
 We propose a novel relationship induced (SIRI) network for language-guided localization.
We show that our method is around 24% better than the state-of-the-art method in terms of accuracy, measured by an 80-pixel radius.
Our method also generalizes well on our proposed extended dataset collected using the same settings as Touchdown.
 arXiv  Detail & Related papers  (2020-10-27T14:04:05Z)
- Learning 3D Semantic Scene Graphs from 3D Indoor Reconstructions [94.17683799712397]
 We focus on scene graphs, a data structure that organizes the entities of a scene in a graph.
We propose a learned method that regresses a scene graph from the point cloud of a scene.
We show the application of our method in a domain-agnostic retrieval task, where graphs serve as an intermediate representation for 3D-3D and 2D-3D matching.
 arXiv  Detail & Related papers  (2020-04-08T12:25:25Z)
- GPS-Net: Graph Property Sensing Network for Scene Graph Generation [91.60326359082408]
 Scene graph generation (SGG) aims to detect objects in an image along with their pairwise relationships.
GPS-Net fully explores three properties for SGG: edge direction information, the difference in priority between nodes, and the long-tailed distribution of relationships.
GPS-Net achieves state-of-the-art performance on three popular databases: VG, OI, and VRD by significant gains under various settings and metrics.
 arXiv  Detail & Related papers  (2020-03-29T07:22:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.