Efficient stereo matching on embedded GPUs with zero-means cross
correlation
- URL: http://arxiv.org/abs/2212.00476v1
- Date: Thu, 1 Dec 2022 13:03:38 GMT
- Title: Efficient stereo matching on embedded GPUs with zero-means cross
correlation
- Authors: Qiong Chang, Aolong Zha, Weimin Wang, Xin Liu, Masaki Onishi, Lei Lei,
Meng Joo Er, Tsutomu Maruyama
- Abstract summary: We propose a novel acceleration approach for the zero-means normalized cross correlation (ZNCC) matching cost calculation algorithm on a Jetson Tx2 embedded GPU.
In our method for accelerating ZNCC, target images are scanned in a zigzag fashion to efficiently reuse one pixel's computation for its neighboring pixels.
Our system show real-time processing speed of 32 fps, on a Jetson Tx2 GPU for 1,280x384 pixel images with a maximum disparity of 128.
- Score: 8.446808526407738
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Mobile stereo-matching systems have become an important part of many
applications, such as automated-driving vehicles and autonomous robots.
Accurate stereo-matching methods usually lead to high computational complexity;
however, mobile platforms have only limited hardware resources to keep their
power consumption low; this makes it difficult to maintain both an acceptable
processing speed and accuracy on mobile platforms. To resolve this trade-off,
we herein propose a novel acceleration approach for the well-known zero-means
normalized cross correlation (ZNCC) matching cost calculation algorithm on a
Jetson Tx2 embedded GPU. In our method for accelerating ZNCC, target images are
scanned in a zigzag fashion to efficiently reuse one pixel's computation for
its neighboring pixels; this reduces the amount of data transmission and
increases the utilization of on-chip registers, thus increasing the processing
speed. As a result, our method is 2X faster than the traditional image scanning
method, and 26% faster than the latest NCC method. By combining this technique
with the domain transformation (DT) algorithm, our system show real-time
processing speed of 32 fps, on a Jetson Tx2 GPU for 1,280x384 pixel images with
a maximum disparity of 128. Additionally, the evaluation results on the KITTI
2015 benchmark show that our combined system is more accurate than the same
algorithm combined with census by 7.26%, while maintaining almost the same
processing speed.
Related papers
- EDCSSM: Edge Detection with Convolutional State Space Model [3.649463841174485]
Edge detection in images is the foundation of many complex tasks in computer graphics.
Due to the feature loss caused by multi-layer convolution and pooling architectures, learning-based edge detection models often produce thick edges.
This paper presents an edge detection algorithm which effectively addresses the aforementioned issues.
arXiv Detail & Related papers (2024-09-03T05:13:25Z) - SwiftFormer: Efficient Additive Attention for Transformer-based
Real-time Mobile Vision Applications [98.90623605283564]
We introduce a novel efficient additive attention mechanism that effectively replaces the quadratic matrix multiplication operations with linear element-wise multiplications.
We build a series of models called "SwiftFormer" which achieves state-of-the-art performance in terms of both accuracy and mobile inference speed.
Our small variant achieves 78.5% top-1 ImageNet-1K accuracy with only 0.8 ms latency on iPhone 14, which is more accurate and 2x faster compared to MobileViT-v2.
arXiv Detail & Related papers (2023-03-27T17:59:58Z) - CoordFill: Efficient High-Resolution Image Inpainting via Parameterized
Coordinate Querying [52.91778151771145]
In this paper, we try to break the limitations for the first time thanks to the recent development of continuous implicit representation.
Experiments show that the proposed method achieves real-time performance on the 2048$times$2048 images using a single GTX 2080 Ti GPU.
arXiv Detail & Related papers (2023-03-15T11:13:51Z) - Rapid Person Re-Identification via Sub-space Consistency Regularization [51.76876061721556]
Person Re-Identification (ReID) matches pedestrians across disjoint cameras.
Existing ReID methods adopting real-value feature descriptors have achieved high accuracy, but they are low in efficiency due to the slow Euclidean distance computation.
We propose a novel Sub-space Consistency Regularization (SCR) algorithm that can speed up the ReID procedure by 0.25$ times.
arXiv Detail & Related papers (2022-07-13T02:44:05Z) - UHD Image Deblurring via Multi-scale Cubic-Mixer [12.402054374952485]
transformer-based algorithms are making a splash in the domain of image deblurring.
These algorithms depend on the self-attention mechanism with CNN stem to model long range dependencies between tokens.
arXiv Detail & Related papers (2022-06-08T05:04:43Z) - Parallel Discrete Convolutions on Adaptive Particle Representations of
Images [2.362412515574206]
We present data structures and algorithms for native implementations of discrete convolution operators over Adaptive Particle Representations.
The APR is a content-adaptive image representation that locally adapts the sampling resolution to the image signal.
We show that APR convolution naturally leads to scale-adaptive algorithms that efficiently parallelize on multi-core CPU and GPU architectures.
arXiv Detail & Related papers (2021-12-07T09:40:05Z) - CNNs for JPEGs: A Study in Computational Cost [49.97673761305336]
Convolutional neural networks (CNNs) have achieved astonishing advances over the past decade.
CNNs are capable of learning robust representations of the data directly from the RGB pixels.
Deep learning methods capable of learning directly from the compressed domain have been gaining attention in recent years.
arXiv Detail & Related papers (2020-12-26T15:00:10Z) - Displacement-Invariant Cost Computation for Efficient Stereo Matching [122.94051630000934]
Deep learning methods have dominated stereo matching leaderboards by yielding unprecedented disparity accuracy.
But their inference time is typically slow, on the order of seconds for a pair of 540p images.
We propose a emphdisplacement-invariant cost module to compute the matching costs without needing a 4D feature volume.
arXiv Detail & Related papers (2020-12-01T23:58:16Z) - Faster Mean-shift: GPU-accelerated clustering for cosine embedding-based
cell segmentation and tracking [12.60841328582138]
We propose a novel Faster Mean-shift algorithm, which tackles the computational bottleneck of embedding based cell segmentation and tracking.
The proposed Faster Mean-shift algorithm achieved 7-10 times speedup compared to the state-of-the-art embedding based cell instance segmentation and tracking algorithm.
Our Faster Mean-shift algorithm also achieved the highest computational speed compared to other GPU benchmarks with optimized memory consumption.
arXiv Detail & Related papers (2020-07-28T14:52:51Z) - Efficient Video Semantic Segmentation with Labels Propagation and
Refinement [138.55845680523908]
This paper tackles the problem of real-time semantic segmentation of high definition videos using a hybrid GPU / CPU approach.
We propose an Efficient Video(EVS) pipeline that combines: (i) On the CPU, a very fast optical flow method, that is used to exploit the temporal aspect of the video and propagate semantic information from one frame to the next.
On the popular Cityscapes dataset with high resolution frames (2048 x 1024), the proposed operating points range from 80 to 1000 Hz on a single GPU and CPU.
arXiv Detail & Related papers (2019-12-26T11:45:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.