Mono-hydra: Real-time 3D scene graph construction from monocular camera
input with IMU
- URL: http://arxiv.org/abs/2308.05515v1
- Date: Thu, 10 Aug 2023 11:58:38 GMT
- Title: Mono-hydra: Real-time 3D scene graph construction from monocular camera
input with IMU
- Authors: U.V.B.L. Udugama, G. Vosselman, F. Nex
- Abstract summary: The ability of robots to autonomously navigate through 3D environments depends on their comprehension of spatial concepts.
3D scene graphs have emerged as a robust tool for representing the environment as a layered graph of concepts and their relationships.
This paper puts forth a real-time spatial perception system Mono-Hydra, combining a monocular camera and an IMU sensor setup, focusing on indoor scenarios.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: The ability of robots to autonomously navigate through 3D environments
depends on their comprehension of spatial concepts, ranging from low-level
geometry to high-level semantics, such as objects, places, and buildings. To
enable such comprehension, 3D scene graphs have emerged as a robust tool for
representing the environment as a layered graph of concepts and their
relationships. However, building these representations using monocular vision
systems in real-time remains a difficult task that has not been explored in
depth. This paper puts forth a real-time spatial perception system Mono-Hydra,
combining a monocular camera and an IMU sensor setup, focusing on indoor
scenarios. However, the proposed approach is adaptable to outdoor applications,
offering flexibility in its potential uses. The system employs a suite of deep
learning algorithms to derive depth and semantics. It uses a robocentric
visual-inertial odometry (VIO) algorithm based on square-root information,
thereby ensuring consistent visual odometry with an IMU and a monocular camera.
This system achieves sub-20 cm error in real-time processing at 15 fps,
enabling real-time 3D scene graph construction using a laptop GPU (NVIDIA
3080). This enhances decision-making efficiency and effectiveness in simple
camera setups, augmenting robotic system agility. We make Mono-Hydra publicly
available at: https://github.com/UAV-Centre-ITC/Mono_Hydra
Related papers
- EmbodiedSAM: Online Segment Any 3D Thing in Real Time [61.2321497708998]
Embodied tasks require the agent to fully understand 3D scenes simultaneously with its exploration.
An online, real-time, fine-grained and highly-generalized 3D perception model is desperately needed.
arXiv Detail & Related papers (2024-08-21T17:57:06Z) - SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction [77.15924044466976]
We propose SelfOcc to explore a self-supervised way to learn 3D occupancy using only video sequences.
We first transform the images into the 3D space (e.g., bird's eye view) to obtain 3D representation of the scene.
We can then render 2D images of previous and future frames as self-supervision signals to learn the 3D representations.
arXiv Detail & Related papers (2023-11-21T17:59:14Z) - SOGDet: Semantic-Occupancy Guided Multi-view 3D Object Detection [19.75965521357068]
We propose a novel approach called SOGDet (Semantic-Occupancy Guided Multi-view 3D Object Detection) to improve the accuracy of 3D object detection.
Our results show that SOGDet consistently enhance the performance of three baseline methods in terms of nuScenes Detection Score (NDS) and mean Average Precision (mAP)
This indicates that the combination of 3D object detection and 3D semantic occupancy leads to a more comprehensive perception of the 3D environment, thereby aiding build more robust autonomous driving systems.
arXiv Detail & Related papers (2023-08-26T07:38:21Z) - Neural Implicit Dense Semantic SLAM [83.04331351572277]
We propose a novel RGBD vSLAM algorithm that learns a memory-efficient, dense 3D geometry, and semantic segmentation of an indoor scene in an online manner.
Our pipeline combines classical 3D vision-based tracking and loop closing with neural fields-based mapping.
Our proposed algorithm can greatly enhance scene perception and assist with a range of robot control problems.
arXiv Detail & Related papers (2023-04-27T23:03:52Z) - SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving [98.74706005223685]
3D scene understanding plays a vital role in vision-based autonomous driving.
We propose a SurroundOcc method to predict the 3D occupancy with multi-camera images.
arXiv Detail & Related papers (2023-03-16T17:59:08Z) - Semi-Perspective Decoupled Heatmaps for 3D Robot Pose Estimation from
Depth Maps [66.24554680709417]
Knowing the exact 3D location of workers and robots in a collaborative environment enables several real applications.
We propose a non-invasive framework based on depth devices and deep neural networks to estimate the 3D pose of robots from an external camera.
arXiv Detail & Related papers (2022-07-06T08:52:12Z) - Learning Ego 3D Representation as Ray Tracing [42.400505280851114]
We present a novel end-to-end architecture for ego 3D representation learning from unconstrained camera views.
Inspired by the ray tracing principle, we design a polarized grid of "imaginary eyes" as the learnable ego 3D representation.
We show that our model outperforms all state-of-the-art alternatives significantly.
arXiv Detail & Related papers (2022-06-08T17:55:50Z) - Learning Optical Flow, Depth, and Scene Flow without Real-World Labels [33.586124995327225]
Self-supervised monocular depth estimation enables robots to learn 3D perception from raw video streams.
We propose DRAFT, a new method capable of jointly learning depth, optical flow, and scene flow.
arXiv Detail & Related papers (2022-03-28T20:52:12Z) - Unsupervised Learning of Visual 3D Keypoints for Control [104.92063943162896]
Learning sensorimotor control policies from high-dimensional images crucially relies on the quality of the underlying visual representations.
We propose a framework to learn such a 3D geometric structure directly from images in an end-to-end unsupervised manner.
These discovered 3D keypoints tend to meaningfully capture robot joints as well as object movements in a consistent manner across both time and 3D space.
arXiv Detail & Related papers (2021-06-14T17:59:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.