DepthNet Nano: A Highly Compact Self-Normalizing Neural Network for
Monocular Depth Estimation
- URL: http://arxiv.org/abs/2004.08008v1
- Date: Fri, 17 Apr 2020 00:41:35 GMT
- Title: DepthNet Nano: A Highly Compact Self-Normalizing Neural Network for
Monocular Depth Estimation
- Authors: Linda Wang, Mahmoud Famouri, and Alexander Wong
- Abstract summary: DepthNet Nano is a compact deep neural network for monocular depth estimation designed using a human machine collaborative design strategy.
The proposed DepthNet Nano possesses a highly efficient network architecture, while still achieving comparable performance with state-of-the-art networks.
- Score: 76.90627702089357
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Depth estimation is an active area of research in the field of computer
vision, and has garnered significant interest due to its rising demand in a
large number of applications ranging from robotics and unmanned aerial vehicles
to autonomous vehicles. A particularly challenging problem in this area is
monocular depth estimation, where the goal is to infer depth from a single
image. An effective strategy that has shown considerable promise in recent
years for tackling this problem is the utilization of deep convolutional neural
networks. Despite these successes, the memory and computational requirements of
such networks have made widespread deployment in embedded scenarios very
challenging. In this study, we introduce DepthNet Nano, a highly compact self
normalizing network for monocular depth estimation designed using a human
machine collaborative design strategy, where principled network design
prototyping based on encoder-decoder design principles are coupled with
machine-driven design exploration. The result is a compact deep neural network
with highly customized macroarchitecture and microarchitecture designs, as well
as self-normalizing characteristics, that are highly tailored for the task of
embedded depth estimation. The proposed DepthNet Nano possesses a highly
efficient network architecture (e.g., 24X smaller and 42X fewer MAC operations
than Alhashim et al. on KITTI), while still achieving comparable performance
with state-of-the-art networks on the NYU-Depth V2 and KITTI datasets.
Furthermore, experiments on inference speed and energy efficiency on a Jetson
AGX Xavier embedded module further illustrate the efficacy of DepthNet Nano at
different resolutions and power budgets (e.g., ~14 FPS and >0.46
images/sec/watt at 384 X 1280 at a 30W power budget on KITTI).
Related papers
- Optimized Deployment of Deep Neural Networks for Visual Pose Estimation
on Nano-drones [9.806742394395322]
Miniaturized unmanned aerial vehicles (UAVs) are gaining popularity due to their small size, enabling new tasks such as indoor navigation or people monitoring.
This work proposes a new automatic optimization pipeline for visual pose estimation tasks using Deep Neural Networks (DNNs)
Our results improve the state-of-the-art reducing inference latency by up to 3.22x at iso-error.
arXiv Detail & Related papers (2024-02-23T11:35:57Z) - Quantization-aware Neural Architectural Search for Intrusion Detection [5.010685611319813]
We present a design methodology that automatically trains and evolves quantized neural network (NN) models that are a thousand times smaller than state-of-the-art NNs.
The number of LUTs utilized by this network when deployed to an FPGA is between 2.3x and 8.5x smaller with performance comparable to prior work.
arXiv Detail & Related papers (2023-11-07T18:35:29Z) - Computation-efficient Deep Learning for Computer Vision: A Survey [121.84121397440337]
Deep learning models have reached or even exceeded human-level performance in a range of visual perception tasks.
Deep learning models usually demand significant computational resources, leading to impractical power consumption, latency, or carbon emissions in real-world scenarios.
New research focus is computationally efficient deep learning, which strives to achieve satisfactory performance while minimizing the computational cost during inference.
arXiv Detail & Related papers (2023-08-27T03:55:28Z) - Fast GraspNeXt: A Fast Self-Attention Neural Network Architecture for
Multi-task Learning in Computer Vision Tasks for Robotic Grasping on the Edge [80.88063189896718]
High architectural and computational complexity can result in poor suitability for deployment on embedded devices.
Fast GraspNeXt is a fast self-attention neural network architecture tailored for embedded multi-task learning in computer vision tasks for robotic grasping.
arXiv Detail & Related papers (2023-04-21T18:07:14Z) - DDCNet: Deep Dilated Convolutional Neural Network for Dense Prediction [0.0]
A receptive field (ERF) and a higher resolution of spatial features within a network are essential for providing higher-resolution dense estimates.
We present a systemic approach to design network architectures that can provide a larger receptive field while maintaining a higher spatial feature resolution.
arXiv Detail & Related papers (2021-07-09T23:15:34Z) - Hierarchical Neural Architecture Search for Deep Stereo Matching [131.94481111956853]
We propose the first end-to-end hierarchical NAS framework for deep stereo matching.
Our framework incorporates task-specific human knowledge into the neural architecture search framework.
It is ranked at the top 1 accuracy on KITTI stereo 2012, 2015 and Middlebury benchmarks, as well as the top 1 on SceneFlow dataset.
arXiv Detail & Related papers (2020-10-26T11:57:37Z) - On Deep Learning Techniques to Boost Monocular Depth Estimation for
Autonomous Navigation [1.9007546108571112]
Inferring the depth of images is a fundamental inverse problem within the field of Computer Vision.
We propose a new lightweight and fast supervised CNN architecture combined with novel feature extraction models.
We also introduce an efficient surface normals module, jointly with a simple geometric 2.5D loss function, to solve SIDE problems.
arXiv Detail & Related papers (2020-10-13T18:37:38Z) - AttendNets: Tiny Deep Image Recognition Neural Networks for the Edge via
Visual Attention Condensers [81.17461895644003]
We introduce AttendNets, low-precision, highly compact deep neural networks tailored for on-device image recognition.
AttendNets possess deep self-attention architectures based on visual attention condensers.
Results show AttendNets have significantly lower architectural and computational complexity when compared to several deep neural networks.
arXiv Detail & Related papers (2020-09-30T01:53:17Z) - EmotionNet Nano: An Efficient Deep Convolutional Neural Network Design
for Real-time Facial Expression Recognition [75.74756992992147]
This study proposes EmotionNet Nano, an efficient deep convolutional neural network created through a human-machine collaborative design strategy.
Two different variants of EmotionNet Nano are presented, each with a different trade-off between architectural and computational complexity and accuracy.
We demonstrate that the proposed EmotionNet Nano networks achieved real-time inference speeds (e.g. $>25$ FPS and $>70$ FPS at 15W and 30W, respectively) and high energy efficiency.
arXiv Detail & Related papers (2020-06-29T00:48:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.