Leveraging commonsense for object localisation in partial scenes
- URL: http://arxiv.org/abs/2211.00562v1
- Date: Tue, 1 Nov 2022 16:17:07 GMT
- Title: Leveraging commonsense for object localisation in partial scenes
- Authors: Francesco Giuliari, Geri Skenderi, Marco Cristani, Alessio Del Bue and
Yiming Wang
- Abstract summary: We propose a novel scene representation to facilitate the geometric reasoning, Directed Spatial Commonsense Graph (D-SCG)
We estimate the unknown position of the target object using a Graph Neural Network that implements a novel attentional message passing mechanism.
We evaluate our method using Partial ScanNet, improving the state-of-the-art by 5.9% in terms of the localisation accuracy at a 8x faster training speed.
- Score: 36.47035776975184
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose an end-to-end solution to address the problem of object
localisation in partial scenes, where we aim to estimate the position of an
object in an unknown area given only a partial 3D scan of the scene. We propose
a novel scene representation to facilitate the geometric reasoning, Directed
Spatial Commonsense Graph (D-SCG), a spatial scene graph that is enriched with
additional concept nodes from a commonsense knowledge base. Specifically, the
nodes of D-SCG represent the scene objects and the edges are their relative
positions. Each object node is then connected via different commonsense
relationships to a set of concept nodes. With the proposed graph-based scene
representation, we estimate the unknown position of the target object using a
Graph Neural Network that implements a novel attentional message passing
mechanism. The network first predicts the relative positions between the target
object and each visible object by learning a rich representation of the objects
via aggregating both the object nodes and the concept nodes in D-SCG. These
relative positions then are merged to obtain the final position. We evaluate
our method using Partial ScanNet, improving the state-of-the-art by 5.9% in
terms of the localisation accuracy at a 8x faster training speed.
Related papers
- Multiview Scene Graph [7.460438046915524]
A proper scene representation is central to the pursuit of spatial intelligence.
We propose to build Multiview Scene Graphs (MSG) from unposed images.
MSG represents a scene topologically with interconnected place and object nodes.
arXiv Detail & Related papers (2024-10-15T02:04:05Z) - Inter-object Discriminative Graph Modeling for Indoor Scene Recognition [5.712940060321454]
We propose to leverage discriminative object knowledge to enhance scene feature representations.
We construct a Discriminative Graph Network (DGN) in which pixel-level scene features are defined as nodes.
With the proposed IODP and DGN, we obtain state-of-the-art results on several widely used scene datasets.
arXiv Detail & Related papers (2023-11-10T08:07:16Z) - 3DRP-Net: 3D Relative Position-aware Network for 3D Visual Grounding [58.924180772480504]
3D visual grounding aims to localize the target object in a 3D point cloud by a free-form language description.
We propose a relation-aware one-stage framework, named 3D Relative Position-aware Network (3-Net)
arXiv Detail & Related papers (2023-07-25T09:33:25Z) - Learning Object Placement via Dual-path Graph Completion [28.346027247882354]
Object placement aims to place a foreground object over a background image with a suitable location and size.
In this work, we treat object placement as a graph completion problem and propose a novel graph completion module (GCM)
The foreground object is encoded as a special node that should be inserted at a reasonable place in this graph.
arXiv Detail & Related papers (2022-07-23T08:39:39Z) - Spatial Commonsense Graph for Object Localisation in Partial Scenes [36.47035776975184]
We solve object localisation in partial scenes, a new problem of estimating the unknown position of an object given a partial 3D scan of a scene.
The proposed solution is based on a novel scene graph model, the Spatial Commonsense Graph (SCG), where objects are the nodes and edges define pairwise distances between them.
The SCG is used to estimate the unknown position of the target object in two steps: first, we feed the SCG into a novel Proximity Prediction Network, a graph neural network that uses attention to perform distance prediction between the node representing the target object and the nodes representing the observed objects in the
arXiv Detail & Related papers (2022-03-10T14:13:35Z) - Towards Part-Based Understanding of RGB-D Scans [43.4094489272776]
We propose the task of part-based scene understanding of real-world 3D environments.
From an RGB-D scan of a scene, we detect objects, and for each object predict its decomposition into geometric part masks.
We leverage an intermediary part graph representation to enable robust completion as well as building of part priors.
arXiv Detail & Related papers (2020-12-03T17:30:02Z) - SIRI: Spatial Relation Induced Network For Spatial Description
Resolution [64.38872296406211]
We propose a novel relationship induced (SIRI) network for language-guided localization.
We show that our method is around 24% better than the state-of-the-art method in terms of accuracy, measured by an 80-pixel radius.
Our method also generalizes well on our proposed extended dataset collected using the same settings as Touchdown.
arXiv Detail & Related papers (2020-10-27T14:04:05Z) - Learning Physical Graph Representations from Visual Scenes [56.7938395379406]
Physical Scene Graphs (PSGs) represent scenes as hierarchical graphs with nodes corresponding intuitively to object parts at different scales, and edges to physical connections between parts.
PSGNet augments standard CNNs by including: recurrent feedback connections to combine low and high-level image information; graph pooling and vectorization operations that convert spatially-uniform feature maps into object-centric graph structures.
We show that PSGNet outperforms alternative self-supervised scene representation algorithms at scene segmentation tasks.
arXiv Detail & Related papers (2020-06-22T16:10:26Z) - Learning 3D Semantic Scene Graphs from 3D Indoor Reconstructions [94.17683799712397]
We focus on scene graphs, a data structure that organizes the entities of a scene in a graph.
We propose a learned method that regresses a scene graph from the point cloud of a scene.
We show the application of our method in a domain-agnostic retrieval task, where graphs serve as an intermediate representation for 3D-3D and 2D-3D matching.
arXiv Detail & Related papers (2020-04-08T12:25:25Z) - GPS-Net: Graph Property Sensing Network for Scene Graph Generation [91.60326359082408]
Scene graph generation (SGG) aims to detect objects in an image along with their pairwise relationships.
GPS-Net fully explores three properties for SGG: edge direction information, the difference in priority between nodes, and the long-tailed distribution of relationships.
GPS-Net achieves state-of-the-art performance on three popular databases: VG, OI, and VRD by significant gains under various settings and metrics.
arXiv Detail & Related papers (2020-03-29T07:22:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.