StereoSpike: Depth Learning with a Spiking Neural Network
- URL: http://arxiv.org/abs/2109.13751v1
- Date: Tue, 28 Sep 2021 14:11:36 GMT
- Title: StereoSpike: Depth Learning with a Spiking Neural Network
- Authors: Ulysse Ran\c{c}on, Javier Cuadrado-Anibarro, Benoit R. Cottereau and
Timoth\'ee Masquelier
- Abstract summary: We present an end-to-end neuromorphic approach to depth estimation.
We use a Spiking Neural Network (SNN) with a slightly modified U-Net-like encoder-decoder architecture, that we named StereoSpike.
We demonstrate that this architecture generalizes very well, even better than its non-spiking counterparts.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Depth estimation is an important computer vision task, useful in particular
for navigation in autonomous vehicles, or for object manipulation in robotics.
Here we solved it using an end-to-end neuromorphic approach, combining two
event-based cameras and a Spiking Neural Network (SNN) with a slightly modified
U-Net-like encoder-decoder architecture, that we named StereoSpike. More
specifically, we used the Multi Vehicle Stereo Event Camera Dataset (MVSEC). It
provides a depth ground-truth, which was used to train StereoSpike in a
supervised manner, using surrogate gradient descent. We propose a novel readout
paradigm to obtain a dense analog prediction -- the depth of each pixel -- from
the spikes of the decoder. We demonstrate that this architecture generalizes
very well, even better than its non-spiking counterparts, leading to
state-of-the-art test accuracy. To the best of our knowledge, it is the first
time that such a large-scale regression problem is solved by a fully spiking
network. Finally, we show that low firing rates (<10%) can be obtained via
regularization, with a minimal cost in accuracy. This means that StereoSpike
could be efficiently implemented on neuromorphic chips, opening the door for
low power and real time embedded systems.
Related papers
- Neural Implicit Dense Semantic SLAM [83.04331351572277]
We propose a novel RGBD vSLAM algorithm that learns a memory-efficient, dense 3D geometry, and semantic segmentation of an indoor scene in an online manner.
Our pipeline combines classical 3D vision-based tracking and loop closing with neural fields-based mapping.
Our proposed algorithm can greatly enhance scene perception and assist with a range of robot control problems.
arXiv Detail & Related papers (2023-04-27T23:03:52Z) - Lightweight Monocular Depth Estimation with an Edge Guided Network [34.03711454383413]
We present a novel lightweight Edge Guided Depth Estimation Network (EGD-Net)
In particular, we start out with a lightweight encoder-decoder architecture and embed an edge guidance branch.
In order to aggregate the context information and edge attention features, we design a transformer-based feature aggregation module.
arXiv Detail & Related papers (2022-09-29T14:45:47Z) - StereoVoxelNet: Real-Time Obstacle Detection Based on Occupancy Voxels
from a Stereo Camera Using Deep Neural Networks [32.7826524859756]
Obstacle detection is a safety-critical problem in robot navigation, where stereo matching is a popular vision-based approach.
This paper proposes a computationally efficient method that leverages a deep neural network to detect occupancy from stereo images directly.
Our approach detects obstacles accurately in the range of 32 meters and achieves better IoU (Intersection over Union) and CD (Chamfer Distance) scores with only 2% of the computation cost of the state-of-the-art stereo model.
arXiv Detail & Related papers (2022-09-18T03:32:38Z) - CondenseNeXt: An Ultra-Efficient Deep Neural Network for Embedded
Systems [0.0]
A Convolutional Neural Network (CNN) is a class of Deep Neural Network (DNN) widely used in the analysis of visual images captured by an image sensor.
In this paper, we propose a neoteric variant of deep convolutional neural network architecture to ameliorate the performance of existing CNN architectures for real-time inference on embedded systems.
arXiv Detail & Related papers (2021-12-01T18:20:52Z) - DeepA: A Deep Neural Analyzer For Speech And Singing Vocoding [71.73405116189531]
We propose a neural vocoder that extracts F0 and timbre/aperiodicity encoding from the input speech that emulates those defined in conventional vocoders.
As the deep neural analyzer is learnable, it is expected to be more accurate for signal reconstruction and manipulation, and generalizable from speech to singing.
arXiv Detail & Related papers (2021-10-13T01:39:57Z) - Instantaneous Stereo Depth Estimation of Real-World Stimuli with a
Neuromorphic Stereo-Vision Setup [4.28479274054892]
Spiking Neural Network (SNN) architectures for stereo vision have the potential of simplifying the stereo-matching problem.
We validate a brain-inspired event-based stereo-matching architecture implemented on a mixed-signal neuromorphic processor with real-world data.
arXiv Detail & Related papers (2021-04-06T14:31:23Z) - Sparse Auxiliary Networks for Unified Monocular Depth Prediction and
Completion [56.85837052421469]
Estimating scene geometry from data obtained with cost-effective sensors is key for robots and self-driving cars.
In this paper, we study the problem of predicting dense depth from a single RGB image with optional sparse measurements from low-cost active depth sensors.
We introduce Sparse Networks (SANs), a new module enabling monodepth networks to perform both the tasks of depth prediction and completion.
arXiv Detail & Related papers (2021-03-30T21:22:26Z) - PLADE-Net: Towards Pixel-Level Accuracy for Self-Supervised Single-View
Depth Estimation with Neural Positional Encoding and Distilled Matting Loss [49.66736599668501]
We propose a self-supervised single-view pixel-level accurate depth estimation network, called PLADE-Net.
Our method shows unprecedented accuracy levels, exceeding 95% in terms of the $delta1$ metric on the KITTI dataset.
arXiv Detail & Related papers (2021-03-12T15:54:46Z) - Hierarchical Neural Architecture Search for Deep Stereo Matching [131.94481111956853]
We propose the first end-to-end hierarchical NAS framework for deep stereo matching.
Our framework incorporates task-specific human knowledge into the neural architecture search framework.
It is ranked at the top 1 accuracy on KITTI stereo 2012, 2015 and Middlebury benchmarks, as well as the top 1 on SceneFlow dataset.
arXiv Detail & Related papers (2020-10-26T11:57:37Z) - MiniNet: An extremely lightweight convolutional neural network for
real-time unsupervised monocular depth estimation [22.495019810166397]
We propose a new powerful network with a recurrent module to achieve the capability of a deep network.
We maintain an extremely lightweight size for real-time high performance unsupervised monocular depth prediction from video sequences.
Our new model can run at a speed of about 110 frames per second (fps) on a single GPU, 37 fps on a single CPU, and 2 fps on a Raspberry Pi 3.
arXiv Detail & Related papers (2020-06-27T12:13:22Z) - Depthwise Non-local Module for Fast Salient Object Detection Using a
Single Thread [136.2224792151324]
We propose a new deep learning algorithm for fast salient object detection.
The proposed algorithm achieves competitive accuracy and high inference efficiency simultaneously with a single CPU thread.
arXiv Detail & Related papers (2020-01-22T15:23:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.