Related papers: Efficient Depth Estimation for Unstable Stereo Camera Systems on AR Glasses

Efficient Depth Estimation for Unstable Stereo Camera Systems on AR Glasses

URL: http://arxiv.org/abs/2411.10013v1
Date: Fri, 15 Nov 2024 07:43:45 GMT
Title: Efficient Depth Estimation for Unstable Stereo Camera Systems on AR Glasses
Authors: Yongfan Liu, Hyoukjun Kwon,
Abstract summary: We develop hardware-friendly alternatives to the costly cost volume and preprocessing. For online stereo rectification (preprocessing), we introduce homograhy matrix prediction network with a rectification positional encoding (RPE) Our MultiHeadDepth, which includes optimized cost volume, provides 11.8-30.3% improvements in accuracy and 22.9-25.2% reduction in latency. Our HomoDepth, which includes optimized preprocessing (Homograhpy + RPE), can process unrectified images and reduce the end-to-end latency by 44.5%.
Score: 1.086544864007391
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Stereo depth estimation is a fundamental component in augmented reality (AR) applications. Although AR applications require very low latency for their real-time applications, traditional depth estimation models often rely on time-consuming preprocessing steps such as rectification to achieve high accuracy. Also, non standard ML operator based algorithms such as cost volume also require significant latency, which is aggravated on compute resource-constrained mobile platforms. Therefore, we develop hardware-friendly alternatives to the costly cost volume and preprocessing and design two new models based on them, MultiHeadDepth and HomoDepth. Our approaches for cost volume is replacing it with a new group-pointwise convolution-based operator and approximation of consine similarity based on layernorm and dot product. For online stereo rectification (preprocessing), we introduce homograhy matrix prediction network with a rectification positional encoding (RPE), which delivers both low latency and robustness to unrectified images, which eliminates the needs for preprocessing. Our MultiHeadDepth, which includes optimized cost volume, provides 11.8-30.3% improvements in accuracy and 22.9-25.2% reduction in latency compared to a state-of-the-art depth estimation model for AR glasses from industry. Our HomoDepth, which includes optimized preprocessing (Homograhpy + RPE) upon MultiHeadDepth, can process unrectified images and reduce the end-to-end latency by 44.5%. We adopt a multi-task learning framework to handle misaligned stereo inputs on HomoDepth, which reduces theAbsRel error by 10.0-24.3%. The results demonstrate the efficacy of our approaches in achieving both high model performance with low latency, which makes a step forward toward practical depth estimation on future AR devices.

Related papers

Pseudo Depth Meets Gaussian: A Feed-forward RGB SLAM Baseline [64.42938561167402]
We propose an online 3D reconstruction method using 3D Gaussian-based SLAM, combined with a feed-forward recurrent prediction module.<n>This approach replaces slow test-time optimization with fast network inference, significantly improving tracking speed.<n>Our method achieves performance on par with the state-of-the-art SplaTAM, while reducing tracking time by more than 90%.
arXiv Detail & Related papers (2025-08-06T16:16:58Z)
MLRU++: Multiscale Lightweight Residual UNETR++ with Attention for Efficient 3D Medical Image Segmentation [3.014234061484863]
Multiscale Lightweight Residual UNETR++ architecture designed to balance segmentation accuracy and computational efficiency.<n>Experiments on four publicly available benchmark datasets demonstrate that MLRU++ achieves state-of-the-art performance.<n>Results suggest that MLRU++ offers a practical and high-performing solution for 3D medical image segmentation tasks.
arXiv Detail & Related papers (2025-07-22T00:30:44Z)
Motion-Aware Adaptive Pixel Pruning for Efficient Local Motion Deblurring [87.56382172827526]
We propose a trainable mask predictor that identifies blurred regions in the image.<n>We also develop an intra-frame motion analyzer that translates relative pixel displacements into motion trajectories.<n>Our method is trained end-to-end using a combination of reconstruction loss, reblur loss, and mask loss guided by annotated blur masks.
arXiv Detail & Related papers (2025-07-10T12:38:27Z)
High-Frequency Prior-Driven Adaptive Masking for Accelerating Image Super-Resolution [87.56382172827526]
High-frequency regions are most critical for reconstruction.<n>We propose a training-free adaptive masking module for acceleration.<n>Our method reduces FLOPs by 24--43% for state-of-the-art models.
arXiv Detail & Related papers (2025-05-11T13:18:03Z)
Second-order Optimization of Gaussian Splats with Importance Sampling [51.95046424364725]
3D Gaussian Splatting (3DGS) is widely used for novel view rendering due to its high quality and fast inference time. We propose a novel second-order optimization strategy based on Levenberg-Marquardt (LM) and Conjugate Gradient (CG) Our method achieves a $3times$ speedup over standard LM and outperforms Adam by $6times$ when the Gaussian count is low.
arXiv Detail & Related papers (2025-04-17T12:52:08Z)
Image Coding for Machines via Feature-Preserving Rate-Distortion Optimization [27.97760974010369]
We show an approach to reduce the effect of compression on a task loss using the distance between features as a distortion metric. We simplify the RDO formulation to make the distortion term computable using block-based encoders. We show up to 10% bit-rate savings for the same computer vision accuracy compared to RDO based on SSE.
arXiv Detail & Related papers (2025-04-03T02:11:26Z)
Efficient Density Control for 3D Gaussian Splatting [3.6379656024631215]
3D Gaussian Splatting (3DGS) has demonstrated outstanding performance in novel view synthesis. We propose two key innovations: (1) Long-Axis Split, which precisely controls the position, shape, and opacity of child Gaussians; and (2) Recovery-Aware Pruning, which leverages differences in recovery speed after resetting opacity to prune overfitted Gaussians.
arXiv Detail & Related papers (2024-11-15T12:12:56Z)
ALOcc: Adaptive Lifting-based 3D Semantic Occupancy and Cost Volume-based Flow Prediction [89.89610257714006]
Existing methods prioritize higher accuracy to cater to the demands of these tasks. We introduce a series of targeted improvements for 3D semantic occupancy prediction and flow estimation. Our purelytemporalal architecture framework, named ALOcc, achieves an optimal tradeoff between speed and accuracy.
arXiv Detail & Related papers (2024-11-12T11:32:56Z)
Joint Pruning and Channel-wise Mixed-Precision Quantization for Efficient Deep Neural Networks [10.229120811024162]
deep neural networks (DNNs) pose significant challenges to their deployment on edge devices. Common approaches to address this issue are pruning and mixed-precision quantization. We propose a novel methodology to apply them jointly via a lightweight gradient-based search.
arXiv Detail & Related papers (2024-07-01T08:07:02Z)
Taming 3DGS: High-Quality Radiance Fields with Limited Resources [50.92437599516609]
3D Gaussian Splatting (3DGS) has transformed novel-view synthesis with its fast, interpretable, and high-fidelity rendering. We tackle the challenges of training and rendering 3DGS models on a budget. We derive faster, numerically equivalent solutions for gradient computation and attribute updates.
arXiv Detail & Related papers (2024-06-21T20:44:23Z)
Towards More Accurate Diffusion Model Acceleration with A Timestep Aligner [84.97253871387028]
A diffusion model, which is formulated to produce an image using thousands of denoising steps, usually suffers from a slow inference speed. We propose a timestep aligner that helps find a more accurate integral direction for a particular interval at the minimum cost. Experiments show that our plug-in design can be trained efficiently and boost the inference performance of various state-of-the-art acceleration methods.
arXiv Detail & Related papers (2023-10-14T02:19:07Z)
Surrogate Lagrangian Relaxation: A Path To Retrain-free Deep Neural Network Pruning [9.33753001494221]
Network pruning is a widely used technique to reduce computation cost and model size for deep neural networks. In this paper, we develop a systematic weight-pruning optimization approach based on Surrogate Lagrangian relaxation.
arXiv Detail & Related papers (2023-04-08T22:48:30Z)
Post-Processing Temporal Action Detection [134.26292288193298]
Temporal Action Detection (TAD) methods typically take a pre-processing step in converting an input varying-length video into a fixed-length snippet representation sequence. This pre-processing step would temporally downsample the video, reducing the inference resolution and hampering the detection performance in the original temporal resolution. We introduce a novel model-agnostic post-processing method without model redesign and retraining.
arXiv Detail & Related papers (2022-11-27T19:50:37Z)
LiteDepth: Digging into Fast and Accurate Depth Estimation on Mobile Devices [45.84356762066717]
We develop an end-to-end learning-based model with a tiny weight size (1.4MB) and a short inference time (27FPS on Raspberry Pi 4. We propose a simple yet effective data augmentation strategy, called R2 crop, to boost the model performance. Notably, our solution named LiteDepth ranks 2nd in the MAI&AIM2022 Monocular Depth Estimation Challenge, with a si-RMSE of 0.311, an RMSE of 3.79, and the inference time is 37$ms$ tested on the Raspberry Pi 4.
arXiv Detail & Related papers (2022-09-02T11:38:28Z)
BDIS: Bayesian Dense Inverse Searching Method for Real-Time Stereo Surgical Image Matching [2.990820994368054]
This paper proposes the first CPU-level real-time prior-free stereo matching algorithm for general MIS tasks. We achieve an average 17 Hz on 640*480 images with a single-core CPU (i5-9400) for surgical images. It has similar or higher accuracy and fewer outliers than the baseline ELAS in MIS, while it is 4-5 times faster.
arXiv Detail & Related papers (2022-05-06T10:50:49Z)
Learning to Fit Morphable Models [12.469605679847085]
We build upon recent advances in learned optimization and propose an update rule inspired by the classic Levenberg-Marquardt algorithm. We show the effectiveness of the proposed neural on the problems of 3D body surface estimation from a head-mounted device and face fitting from 2D landmarks.
arXiv Detail & Related papers (2021-11-29T18:59:53Z)
FasterPose: A Faster Simple Baseline for Human Pose Estimation [65.8413964785972]
We propose a design paradigm for cost-effective network with LR representation for efficient pose estimation, named FasterPose. We study the training behavior of FasterPose, and formulate a novel regressive cross-entropy (RCE) loss function for accelerating the convergence. Compared with the previously dominant network of pose estimation, our method reduces 58% of the FLOPs and simultaneously gains 1.3% improvement of accuracy.
arXiv Detail & Related papers (2021-07-07T13:39:08Z)
SADet: Learning An Efficient and Accurate Pedestrian Detector [68.66857832440897]
This paper proposes a series of systematic optimization strategies for the detection pipeline of one-stage detector. It forms a single shot anchor-based detector (SADet) for efficient and accurate pedestrian detection. Though structurally simple, it presents state-of-the-art result and real-time speed of $20$ FPS for VGA-resolution images.
arXiv Detail & Related papers (2020-07-26T12:32:38Z)
AQD: Towards Accurate Fully-Quantized Object Detection [94.06347866374927]
We propose an Accurate Quantized object Detection solution, termed AQD, to get rid of floating-point computation. Our AQD achieves comparable or even better performance compared with the full-precision counterpart under extremely low-bit schemes.
arXiv Detail & Related papers (2020-07-14T09:07:29Z)
APQ: Joint Search for Network Architecture, Pruning and Quantization Policy [49.3037538647714]
We present APQ for efficient deep learning inference on resource-constrained hardware. Unlike previous methods that separately search the neural architecture, pruning policy, and quantization policy, we optimize them in a joint manner. With the same accuracy, APQ reduces the latency/energy by 2x/1.3x over MobileNetV2+HAQ.
arXiv Detail & Related papers (2020-06-15T16:09:17Z)
A Privacy-Preserving-Oriented DNN Pruning and Mobile Acceleration Framework [56.57225686288006]
Weight pruning of deep neural networks (DNNs) has been proposed to satisfy the limited storage and computing capability of mobile edge devices. Previous pruning methods mainly focus on reducing the model size and/or improving performance without considering the privacy of user data. We propose a privacy-preserving-oriented pruning and mobile acceleration framework that does not require the private training dataset.
arXiv Detail & Related papers (2020-03-13T23:52:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.