StereoVoxelNet: Real-Time Obstacle Detection Based on Occupancy Voxels
from a Stereo Camera Using Deep Neural Networks
- URL: http://arxiv.org/abs/2209.08459v1
- Date: Sun, 18 Sep 2022 03:32:38 GMT
- Title: StereoVoxelNet: Real-Time Obstacle Detection Based on Occupancy Voxels
from a Stereo Camera Using Deep Neural Networks
- Authors: Hongyu Li, Zhengang Li, Neset Unver Akmandor, Huaizu Jiang, Yanzhi
Wang, Taskin Padir
- Abstract summary: Obstacle detection is a safety-critical problem in robot navigation, where stereo matching is a popular vision-based approach.
This paper proposes a computationally efficient method that leverages a deep neural network to detect occupancy from stereo images directly.
Our approach detects obstacles accurately in the range of 32 meters and achieves better IoU (Intersection over Union) and CD (Chamfer Distance) scores with only 2% of the computation cost of the state-of-the-art stereo model.
- Score: 32.7826524859756
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Obstacle detection is a safety-critical problem in robot navigation, where
stereo matching is a popular vision-based approach. While deep neural networks
have shown impressive results in computer vision, most of the previous obstacle
detection works only leverage traditional stereo matching techniques to meet
the computational constraints for real-time feedback. This paper proposes a
computationally efficient method that leverages a deep neural network to detect
occupancy from stereo images directly. Instead of learning the point cloud
correspondence from the stereo data, our approach extracts the compact obstacle
distribution based on volumetric representations. In addition, we prune the
computation of safety irrelevant spaces in a coarse-to-fine manner based on
octrees generated by the decoder. As a result, we achieve real-time performance
on the onboard computer (NVIDIA Jetson TX2). Our approach detects obstacles
accurately in the range of 32 meters and achieves better IoU (Intersection over
Union) and CD (Chamfer Distance) scores with only 2% of the computation cost of
the state-of-the-art stereo model. Furthermore, we validate our method's
robustness and real-world feasibility through autonomous navigation experiments
with a real robot. Hence, our work contributes toward closing the gap between
the stereo-based system in robot perception and state-of-the-art stereo models
in computer vision. To counter the scarcity of high-quality real-world indoor
stereo datasets, we collect a 1.36 hours stereo dataset with a Jackal robot
which is used to fine-tune our model. The dataset, the code, and more
visualizations are available at https://lhy.xyz/stereovoxelnet/
Related papers
- Left-right Discrepancy for Adversarial Attack on Stereo Networks [8.420135490466851]
We introduce a novel adversarial attack approach that generates perturbation noise specifically designed to maximize the discrepancy between left and right image features.
Experiments demonstrate the superior capability of our method to induce larger prediction errors in stereo neural networks.
arXiv Detail & Related papers (2024-01-14T02:30:38Z) - MoSS: Monocular Shape Sensing for Continuum Robots [11.377027568901038]
This paper proposes the first eye-to-hand monocular approach to continuum robot shape sensing.
MoSSNet eliminates the cost of stereo matching and reduces requirements on sensing hardware.
A two-segment tendon-driven continuum robot is used for data collection and testing.
arXiv Detail & Related papers (2023-03-02T01:14:32Z) - Neural Scene Representation for Locomotion on Structured Terrain [56.48607865960868]
We propose a learning-based method to reconstruct the local terrain for a mobile robot traversing urban environments.
Using a stream of depth measurements from the onboard cameras and the robot's trajectory, the estimates the topography in the robot's vicinity.
We propose a 3D reconstruction model that faithfully reconstructs the scene, despite the noisy measurements and large amounts of missing data coming from the blind spots of the camera arrangement.
arXiv Detail & Related papers (2022-06-16T10:45:17Z) - Revisiting Domain Generalized Stereo Matching Networks from a Feature
Consistency Perspective [65.37571681370096]
We propose a simple pixel-wise contrastive learning across the viewpoints.
A stereo selective whitening loss is introduced to better preserve the stereo feature consistency across domains.
Our method achieves superior performance over several state-of-the-art networks.
arXiv Detail & Related papers (2022-03-21T11:21:41Z) - Self-Supervised Depth Completion for Active Stereo [55.79929735390945]
Active stereo systems are widely used in the robotics industry due to their low cost and high quality depth maps.
These depth sensors suffer from stereo artefacts and do not provide dense depth estimates.
We present the first self-supervised depth completion method for active stereo systems that predicts accurate dense depth maps.
arXiv Detail & Related papers (2021-10-07T07:33:52Z) - StereoSpike: Depth Learning with a Spiking Neural Network [0.0]
We present an end-to-end neuromorphic approach to depth estimation.
We use a Spiking Neural Network (SNN) with a slightly modified U-Net-like encoder-decoder architecture, that we named StereoSpike.
We demonstrate that this architecture generalizes very well, even better than its non-spiking counterparts.
arXiv Detail & Related papers (2021-09-28T14:11:36Z) - Learnable Online Graph Representations for 3D Multi-Object Tracking [156.58876381318402]
We propose a unified and learning based approach to the 3D MOT problem.
We employ a Neural Message Passing network for data association that is fully trainable.
We show the merit of the proposed approach on the publicly available nuScenes dataset by achieving state-of-the-art performance of 65.6% AMOTA and 58% fewer ID-switches.
arXiv Detail & Related papers (2021-04-23T17:59:28Z) - Instantaneous Stereo Depth Estimation of Real-World Stimuli with a
Neuromorphic Stereo-Vision Setup [4.28479274054892]
Spiking Neural Network (SNN) architectures for stereo vision have the potential of simplifying the stereo-matching problem.
We validate a brain-inspired event-based stereo-matching architecture implemented on a mixed-signal neuromorphic processor with real-world data.
arXiv Detail & Related papers (2021-04-06T14:31:23Z) - Reversing the cycle: self-supervised deep stereo through enhanced
monocular distillation [51.714092199995044]
In many fields, self-supervised learning solutions are rapidly evolving and filling the gap with supervised approaches.
We propose a novel self-supervised paradigm reversing the link between the two.
In order to train deep stereo networks, we distill knowledge through a monocular completion network.
arXiv Detail & Related papers (2020-08-17T07:40:22Z) - Risk-Averse MPC via Visual-Inertial Input and Recurrent Networks for
Online Collision Avoidance [95.86944752753564]
We propose an online path planning architecture that extends the model predictive control (MPC) formulation to consider future location uncertainties.
Our algorithm combines an object detection pipeline with a recurrent neural network (RNN) which infers the covariance of state estimates.
The robustness of our methods is validated on complex quadruped robot dynamics and can be generally applied to most robotic platforms.
arXiv Detail & Related papers (2020-07-28T07:34:30Z) - Stereo RGB and Deeper LIDAR Based Network for 3D Object Detection [40.34710686994996]
3D object detection has become an emerging task in autonomous driving scenarios.
Previous works process 3D point clouds using either projection-based or voxel-based models.
We propose the Stereo RGB and Deeper LIDAR framework which can utilize semantic and spatial information simultaneously.
arXiv Detail & Related papers (2020-06-09T11:19:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.