LiteDepth: Digging into Fast and Accurate Depth Estimation on Mobile
Devices
- URL: http://arxiv.org/abs/2209.00961v1
- Date: Fri, 2 Sep 2022 11:38:28 GMT
- Title: LiteDepth: Digging into Fast and Accurate Depth Estimation on Mobile
Devices
- Authors: Zhenyu Li, Zehui Chen, Jialei Xu, Xianming Liu, Junjun Jiang
- Abstract summary: We develop an end-to-end learning-based model with a tiny weight size (1.4MB) and a short inference time (27FPS on Raspberry Pi 4.
We propose a simple yet effective data augmentation strategy, called R2 crop, to boost the model performance.
Notably, our solution named LiteDepth ranks 2nd in the MAI&AIM2022 Monocular Depth Estimation Challenge, with a si-RMSE of 0.311, an RMSE of 3.79, and the inference time is 37$ms$ tested on the Raspberry Pi 4.
- Score: 45.84356762066717
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Monocular depth estimation is an essential task in the computer vision
community. While tremendous successful methods have obtained excellent results,
most of them are computationally expensive and not applicable for real-time
on-device inference. In this paper, we aim to address more practical
applications of monocular depth estimation, where the solution should consider
not only the precision but also the inference time on mobile devices. To this
end, we first develop an end-to-end learning-based model with a tiny weight
size (1.4MB) and a short inference time (27FPS on Raspberry Pi 4). Then, we
propose a simple yet effective data augmentation strategy, called R2 crop, to
boost the model performance. Moreover, we observe that the simple lightweight
model trained with only one single loss term will suffer from performance
bottleneck. To alleviate this issue, we adopt multiple loss terms to provide
sufficient constraints during the training stage. Furthermore, with a simple
dynamic re-weight strategy, we can avoid the time-consuming hyper-parameter
choice of loss terms. Finally, we adopt the structure-aware distillation to
further improve the model performance. Notably, our solution named LiteDepth
ranks 2nd in the MAI&AIM2022 Monocular Depth Estimation Challenge}, with a
si-RMSE of 0.311, an RMSE of 3.79, and the inference time is 37$ms$ tested on
the Raspberry Pi 4. Notably, we provide the fastest solution to the challenge.
Codes and models will be released at
\url{https://github.com/zhyever/LiteDepth}.
Related papers
- Truncated Consistency Models [57.50243901368328]
Training consistency models requires learning to map all intermediate points along PF ODE trajectories to their corresponding endpoints.
We empirically find that this training paradigm limits the one-step generation performance of consistency models.
We propose a new parameterization of the consistency function and a two-stage training procedure that prevents the truncated-time training from collapsing to a trivial solution.
arXiv Detail & Related papers (2024-10-18T22:38:08Z) - LD-Pruner: Efficient Pruning of Latent Diffusion Models using Task-Agnostic Insights [2.8461446020965435]
We introduce LD-Pruner, a novel performance-preserving structured pruning method for compressing Latent Diffusion Models.
We demonstrate the effectiveness of our approach on three different tasks: text-to-image (T2I) generation, Unconditional Image Generation (UIG) and Unconditional Audio Generation (UAG)
arXiv Detail & Related papers (2024-04-18T06:35:37Z) - RainAI -- Precipitation Nowcasting from Satellite Data [5.869633234882028]
This paper presents a solution to the Weather4 2023 competition.
The goal is to forecast high-resolution precipitation with an 8-hour lead time using lower-resolution satellite radiance images.
We propose a simple, yet effective method for learning feature using a 2D U-Net model.
arXiv Detail & Related papers (2023-11-30T09:49:16Z) - SqueezeLLM: Dense-and-Sparse Quantization [80.32162537942138]
Main bottleneck for generative inference with LLMs is memory bandwidth, rather than compute, for single batch inference.
We introduce SqueezeLLM, a post-training quantization framework that enables lossless compression to ultra-low precisions of up to 3-bit.
Our framework incorporates two novel ideas: (i) sensitivity-based non-uniform quantization, which searches for the optimal bit precision assignment based on second-order information; and (ii) the Dense-and-Sparse decomposition that stores outliers and sensitive weight values in an efficient sparse format.
arXiv Detail & Related papers (2023-06-13T08:57:54Z) - Gradient-Free Structured Pruning with Unlabeled Data [57.999191898036706]
We propose a gradient-free structured pruning framework that uses only unlabeled data.
Up to 40% of the original FLOP count can be reduced with less than a 4% accuracy loss across all tasks considered.
arXiv Detail & Related papers (2023-03-07T19:12:31Z) - Pushing the Limits of Asynchronous Graph-based Object Detection with
Event Cameras [62.70541164894224]
We introduce several architecture choices which allow us to scale the depth and complexity of such models while maintaining low computation.
Our method runs 3.7 times faster than a dense graph neural network, taking only 8.4 ms per forward pass.
arXiv Detail & Related papers (2022-11-22T15:14:20Z) - BEVDetNet: Bird's Eye View LiDAR Point Cloud based Real-time 3D Object
Detection for Autonomous Driving [6.389322215324224]
We propose a novel semantic segmentation architecture as a single unified model for object center detection using key points, box predictions and orientation prediction.
The proposed architecture can be trivially extended to include semantic segmentation classes like road without any additional computation.
The model is 5X faster than other top accuracy models with a minimal accuracy degradation of 2% in Average Precision at IoU=0.5 on KITTI dataset.
arXiv Detail & Related papers (2021-04-21T22:06:39Z) - Enabling Retrain-free Deep Neural Network Pruning using Surrogate
Lagrangian Relaxation [2.691929135895278]
We develop a systematic weight-pruning optimization approach based on Surrogate Lagrangian relaxation ( SLR)
SLR achieves higher compression rate than state-of-the-arts under the same accuracy requirement.
Given a limited budget of retraining epochs, our approach quickly recovers the model accuracy.
arXiv Detail & Related papers (2020-12-18T07:17:30Z) - 2nd Place Scheme on Action Recognition Track of ECCV 2020 VIPriors
Challenges: An Efficient Optical Flow Stream Guided Framework [57.847010327319964]
We propose a data-efficient framework that can train the model from scratch on small datasets.
Specifically, by introducing a 3D central difference convolution operation, we proposed a novel C3D neural network-based two-stream framework.
It is proved that our method can achieve a promising result even without a pre-trained model on large scale datasets.
arXiv Detail & Related papers (2020-08-10T09:50:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.