Spatial Commonsense Graph for Object Localisation in Partial Scenes
- URL: http://arxiv.org/abs/2203.05380v2
- Date: Mon, 14 Mar 2022 14:44:41 GMT
- Title: Spatial Commonsense Graph for Object Localisation in Partial Scenes
- Authors: Francesco Giuliari and Geri Skenderi and Marco Cristani and Yiming
Wang and Alessio Del Bue
- Abstract summary: We solve object localisation in partial scenes, a new problem of estimating the unknown position of an object given a partial 3D scan of a scene.
The proposed solution is based on a novel scene graph model, the Spatial Commonsense Graph (SCG), where objects are the nodes and edges define pairwise distances between them.
The SCG is used to estimate the unknown position of the target object in two steps: first, we feed the SCG into a novel Proximity Prediction Network, a graph neural network that uses attention to perform distance prediction between the node representing the target object and the nodes representing the observed objects in the
- Score: 36.47035776975184
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We solve object localisation in partial scenes, a new problem of estimating
the unknown position of an object (e.g. where is the bag?) given a partial 3D
scan of a scene. The proposed solution is based on a novel scene graph model,
the Spatial Commonsense Graph (SCG), where objects are the nodes and edges
define pairwise distances between them, enriched by concept nodes and
relationships from a commonsense knowledge base. This allows SCG to better
generalise its spatial inference over unknown 3D scenes. The SCG is used to
estimate the unknown position of the target object in two steps: first, we feed
the SCG into a novel Proximity Prediction Network, a graph neural network that
uses attention to perform distance prediction between the node representing the
target object and the nodes representing the observed objects in the SCG;
second, we propose a Localisation Module based on circular intersection to
estimate the object position using all the predicted pairwise distances in
order to be independent of any reference system. We create a new dataset of
partially reconstructed scenes to benchmark our method and baselines for object
localisation in partial scenes, where our proposed method achieves the best
localisation performance.
Related papers
- Multiview Scene Graph [7.460438046915524]
A proper scene representation is central to the pursuit of spatial intelligence.
We propose to build Multiview Scene Graphs (MSG) from unposed images.
MSG represents a scene topologically with interconnected place and object nodes.
arXiv Detail & Related papers (2024-10-15T02:04:05Z) - GOReloc: Graph-based Object-Level Relocalization for Visual SLAM [17.608119427712236]
This article introduces a novel method for object-level relocalization of robotic systems.
It determines the pose of a camera sensor by robustly associating the object detections in the current frame with 3D objects in a lightweight object-level map.
arXiv Detail & Related papers (2024-08-15T03:54:33Z) - 3DRP-Net: 3D Relative Position-aware Network for 3D Visual Grounding [58.924180772480504]
3D visual grounding aims to localize the target object in a 3D point cloud by a free-form language description.
We propose a relation-aware one-stage framework, named 3D Relative Position-aware Network (3-Net)
arXiv Detail & Related papers (2023-07-25T09:33:25Z) - Leveraging commonsense for object localisation in partial scenes [36.47035776975184]
We propose a novel scene representation to facilitate the geometric reasoning, Directed Spatial Commonsense Graph (D-SCG)
We estimate the unknown position of the target object using a Graph Neural Network that implements a novel attentional message passing mechanism.
We evaluate our method using Partial ScanNet, improving the state-of-the-art by 5.9% in terms of the localisation accuracy at a 8x faster training speed.
arXiv Detail & Related papers (2022-11-01T16:17:07Z) - Neural Correspondence Field for Object Pose Estimation [67.96767010122633]
We propose a method for estimating the 6DoF pose of a rigid object with an available 3D model from a single RGB image.
Unlike classical correspondence-based methods which predict 3D object coordinates at pixels of the input image, the proposed method predicts 3D object coordinates at 3D query points sampled in the camera frustum.
arXiv Detail & Related papers (2022-07-30T01:48:23Z) - Location-Sensitive Visual Recognition with Cross-IOU Loss [177.86369890708457]
This paper proposes a unified solution named location-sensitive network (LSNet) for object detection, instance segmentation, and pose estimation.
Based on a deep neural network as the backbone, LSNet predicts an anchor point and a set of landmarks which together define the shape of the target object.
arXiv Detail & Related papers (2021-04-11T02:17:14Z) - HDNet: Human Depth Estimation for Multi-Person Camera-Space Localization [83.57863764231655]
We propose the Human Depth Estimation Network (HDNet), an end-to-end framework for absolute root joint localization.
A skeleton-based Graph Neural Network (GNN) is utilized to propagate features among joints.
We evaluate our HDNet on the root joint localization and root-relative 3D pose estimation tasks with two benchmark datasets.
arXiv Detail & Related papers (2020-07-17T12:44:23Z) - Joint Spatial-Temporal Optimization for Stereo 3D Object Tracking [34.40019455462043]
We propose a joint spatial-temporal optimization-based stereo 3D object tracking method.
From the network, we detect corresponding 2D bounding boxes on adjacent images and regress an initial 3D bounding box.
Dense object cues that associating to the object centroid are then predicted using a region-based network.
arXiv Detail & Related papers (2020-04-20T13:59:46Z) - GPS-Net: Graph Property Sensing Network for Scene Graph Generation [91.60326359082408]
Scene graph generation (SGG) aims to detect objects in an image along with their pairwise relationships.
GPS-Net fully explores three properties for SGG: edge direction information, the difference in priority between nodes, and the long-tailed distribution of relationships.
GPS-Net achieves state-of-the-art performance on three popular databases: VG, OI, and VRD by significant gains under various settings and metrics.
arXiv Detail & Related papers (2020-03-29T07:22:31Z) - Object as Hotspots: An Anchor-Free 3D Object Detection Approach via
Firing of Hotspots [37.16690737208046]
We argue for an approach opposite to existing methods using object-level anchors.
Inspired by compositional models, we propose an object as composition of its interior non-empty voxels, termed hotspots.
Based on OHS, we propose an anchor-free detection head with a novel ground truth assignment strategy.
arXiv Detail & Related papers (2019-12-30T03:02:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.