Related papers: LiteDepth: Digging into Fast and Accurate Depth Estimation on Mobile Devices

LiteDepth: Digging into Fast and Accurate Depth Estimation on Mobile Devices

URL: http://arxiv.org/abs/2209.00961v1
Date: Fri, 2 Sep 2022 11:38:28 GMT
Title: LiteDepth: Digging into Fast and Accurate Depth Estimation on Mobile Devices
Authors: Zhenyu Li, Zehui Chen, Jialei Xu, Xianming Liu, Junjun Jiang
Abstract summary: We develop an end-to-end learning-based model with a tiny weight size (1.4MB) and a short inference time (27FPS on Raspberry Pi 4. We propose a simple yet effective data augmentation strategy, called R2 crop, to boost the model performance. Notably, our solution named LiteDepth ranks 2nd in the MAI&AIM2022 Monocular Depth Estimation Challenge, with a si-RMSE of 0.311, an RMSE of 3.79, and the inference time is 37$ms$ tested on the Raspberry Pi 4.
Score: 45.84356762066717
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Monocular depth estimation is an essential task in the computer vision community. While tremendous successful methods have obtained excellent results, most of them are computationally expensive and not applicable for real-time on-device inference. In this paper, we aim to address more practical applications of monocular depth estimation, where the solution should consider not only the precision but also the inference time on mobile devices. To this end, we first develop an end-to-end learning-based model with a tiny weight size (1.4MB) and a short inference time (27FPS on Raspberry Pi 4). Then, we propose a simple yet effective data augmentation strategy, called R2 crop, to boost the model performance. Moreover, we observe that the simple lightweight model trained with only one single loss term will suffer from performance bottleneck. To alleviate this issue, we adopt multiple loss terms to provide sufficient constraints during the training stage. Furthermore, with a simple dynamic re-weight strategy, we can avoid the time-consuming hyper-parameter choice of loss terms. Finally, we adopt the structure-aware distillation to further improve the model performance. Notably, our solution named LiteDepth ranks 2nd in the MAI&AIM2022 Monocular Depth Estimation Challenge}, with a si-RMSE of 0.311, an RMSE of 3.79, and the inference time is 37$ms$ tested on the Raspberry Pi 4. Notably, we provide the fastest solution to the challenge. Codes and models will be released at \url{https://github.com/zhyever/LiteDepth}.

Related papers

A lightweight model FDM-YOLO for small target improvement based on YOLOv8 [0.0]
Small targets are difficult to detect due to their low pixel count, complex backgrounds, and varying shooting angles. This paper focuses on small target detection and explores methods for object detection under low computational constraints.
arXiv Detail & Related papers (2025-03-06T14:06:35Z)
Efficient Depth Estimation for Unstable Stereo Camera Systems on AR Glasses [1.086544864007391]
We develop hardware-friendly alternatives to the costly cost volume and preprocessing. For online stereo rectification (preprocessing), we introduce homograhy matrix prediction network with a rectification positional encoding (RPE) Our MultiHeadDepth, which includes optimized cost volume, provides 11.8-30.3% improvements in accuracy and 22.9-25.2% reduction in latency. Our HomoDepth, which includes optimized preprocessing (Homograhpy + RPE), can process unrectified images and reduce the end-to-end latency by 44.5%.
arXiv Detail & Related papers (2024-11-15T07:43:45Z)
Truncated Consistency Models [57.50243901368328]
Training consistency models requires learning to map all intermediate points along PF ODE trajectories to their corresponding endpoints. We empirically find that this training paradigm limits the one-step generation performance of consistency models. We propose a new parameterization of the consistency function and a two-stage training procedure that prevents the truncated-time training from collapsing to a trivial solution.
arXiv Detail & Related papers (2024-10-18T22:38:08Z)
Enhanced Encoder-Decoder Architecture for Accurate Monocular Depth Estimation [0.0]
This paper introduces a novel deep learning-based approach using an enhanced encoder-decoder architecture. It incorporates multi-scale feature extraction to enhance depth prediction accuracy across various object sizes and distances. Experimental results on the KITTI dataset show that our model achieves a significantly faster inference time of 0.019 seconds.
arXiv Detail & Related papers (2024-10-15T13:46:19Z)
LD-Pruner: Efficient Pruning of Latent Diffusion Models using Task-Agnostic Insights [2.8461446020965435]
We introduce LD-Pruner, a novel performance-preserving structured pruning method for compressing Latent Diffusion Models. We demonstrate the effectiveness of our approach on three different tasks: text-to-image (T2I) generation, Unconditional Image Generation (UIG) and Unconditional Audio Generation (UAG)
arXiv Detail & Related papers (2024-04-18T06:35:37Z)
SqueezeLLM: Dense-and-Sparse Quantization [80.32162537942138]
Main bottleneck for generative inference with LLMs is memory bandwidth, rather than compute, for single batch inference. We introduce SqueezeLLM, a post-training quantization framework that enables lossless compression to ultra-low precisions of up to 3-bit. Our framework incorporates two novel ideas: (i) sensitivity-based non-uniform quantization, which searches for the optimal bit precision assignment based on second-order information; and (ii) the Dense-and-Sparse decomposition that stores outliers and sensitive weight values in an efficient sparse format.
arXiv Detail & Related papers (2023-06-13T08:57:54Z)
Gradient-Free Structured Pruning with Unlabeled Data [57.999191898036706]
We propose a gradient-free structured pruning framework that uses only unlabeled data. Up to 40% of the original FLOP count can be reduced with less than a 4% accuracy loss across all tasks considered.
arXiv Detail & Related papers (2023-03-07T19:12:31Z)
Pushing the Limits of Asynchronous Graph-based Object Detection with Event Cameras [62.70541164894224]
We introduce several architecture choices which allow us to scale the depth and complexity of such models while maintaining low computation. Our method runs 3.7 times faster than a dense graph neural network, taking only 8.4 ms per forward pass.
arXiv Detail & Related papers (2022-11-22T15:14:20Z)
BEVDetNet: Bird's Eye View LiDAR Point Cloud based Real-time 3D Object Detection for Autonomous Driving [6.389322215324224]
We propose a novel semantic segmentation architecture as a single unified model for object center detection using key points, box predictions and orientation prediction. The proposed architecture can be trivially extended to include semantic segmentation classes like road without any additional computation. The model is 5X faster than other top accuracy models with a minimal accuracy degradation of 2% in Average Precision at IoU=0.5 on KITTI dataset.
arXiv Detail & Related papers (2021-04-21T22:06:39Z)
Enabling Retrain-free Deep Neural Network Pruning using Surrogate Lagrangian Relaxation [2.691929135895278]
We develop a systematic weight-pruning optimization approach based on Surrogate Lagrangian relaxation ( SLR) SLR achieves higher compression rate than state-of-the-arts under the same accuracy requirement. Given a limited budget of retraining epochs, our approach quickly recovers the model accuracy.
arXiv Detail & Related papers (2020-12-18T07:17:30Z)
2nd Place Scheme on Action Recognition Track of ECCV 2020 VIPriors Challenges: An Efficient Optical Flow Stream Guided Framework [57.847010327319964]
We propose a data-efficient framework that can train the model from scratch on small datasets. Specifically, by introducing a 3D central difference convolution operation, we proposed a novel C3D neural network-based two-stream framework. It is proved that our method can achieve a promising result even without a pre-trained model on large scale datasets.
arXiv Detail & Related papers (2020-08-10T09:50:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.