Faster than FAST: GPU-Accelerated Frontend for High-Speed VIO
- URL: http://arxiv.org/abs/2003.13493v3
- Date: Mon, 3 Aug 2020 09:22:13 GMT
- Title: Faster than FAST: GPU-Accelerated Frontend for High-Speed VIO
- Authors: Balazs Nagy, Philipp Foehn, Davide Scaramuzza
- Abstract summary: This work focuses on the applicability of efficient low-level, GPU hardware-specific instructions to improve on existing computer vision algorithms.
Especially non-maxima suppression and the subsequent feature selection are prominent contributors to the overall image processing latency.
- Score: 46.20949184826173
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The recent introduction of powerful embedded graphics processing units (GPUs)
has allowed for unforeseen improvements in real-time computer vision
applications. It has enabled algorithms to run onboard, well above the standard
video rates, yielding not only higher information processing capability, but
also reduced latency. This work focuses on the applicability of efficient
low-level, GPU hardware-specific instructions to improve on existing computer
vision algorithms in the field of visual-inertial odometry (VIO). While most
steps of a VIO pipeline work on visual features, they rely on image data for
detection and tracking, of which both steps are well suited for
parallelization. Especially non-maxima suppression and the subsequent feature
selection are prominent contributors to the overall image processing latency.
Our work first revisits the problem of non-maxima suppression for feature
detection specifically on GPUs, and proposes a solution that selects local
response maxima, imposes spatial feature distribution, and extracts features
simultaneously. Our second contribution introduces an enhanced FAST feature
detector that applies the aforementioned non-maxima suppression method.
Finally, we compare our method to other state-of-the-art CPU and GPU
implementations, where we always outperform all of them in feature tracking and
detection, resulting in over 1000fps throughput on an embedded Jetson TX2
platform. Additionally, we demonstrate our work integrated in a VIO pipeline
achieving a metric state estimation at ~200fps.
Related papers
- FULL-W2V: Fully Exploiting Data Reuse for W2V on GPU-Accelerated Systems [5.572152653851948]
FULL-W2V exploits the opportunities for data reuse in the W2V algorithm to reduce access to low memory levels and improve temporal locality.
Our prototype implementation achieves 2.97X speedup when ported from Nvidia Pascal P100 to Volta V100 cards, and outperforms the state-of-the-art by 5.72X on V100 cards with the same embedding quality.
arXiv Detail & Related papers (2023-12-12T21:22:07Z) - High Performance Computing Applied to Logistic Regression: A CPU and GPU
Implementation Comparison [0.0]
We present a versatile GPU-based parallel version of Logistic Regression (LR)
Our implementation is a direct translation of the parallel Gradient Descent Logistic Regression algorithm proposed by X. Zou et al.
Our method is particularly advantageous for real-time prediction applications like image recognition, spam detection, and fraud detection.
arXiv Detail & Related papers (2023-08-19T14:49:37Z) - EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense
Prediction [67.11722682878722]
This work presents EfficientViT, a new family of high-resolution vision models with novel multi-scale linear attention.
Our multi-scale linear attention achieves the global receptive field and multi-scale learning.
EfficientViT delivers remarkable performance gains over previous state-of-the-art models.
arXiv Detail & Related papers (2022-05-29T20:07:23Z) - ETAD: A Unified Framework for Efficient Temporal Action Detection [70.21104995731085]
Untrimmed video understanding such as temporal action detection (TAD) often suffers from the pain of huge demand for computing resources.
We build a unified framework for efficient end-to-end temporal action detection (ETAD)
ETAD achieves state-of-the-art performance on both THUMOS-14 and ActivityNet-1.3.
arXiv Detail & Related papers (2022-05-14T21:16:21Z) - Providing Meaningful Data Summarizations Using Examplar-based Clustering
in Industry 4.0 [67.80123919697971]
We show, that our GPU implementation provides speedups of up to 72x using single-precision and up to 452x using half-precision compared to conventional CPU algorithms.
We apply our algorithm to real-world data from injection molding manufacturing processes and discuss how found summaries help with steering this specific process to cut costs and reduce the manufacturing of bad parts.
arXiv Detail & Related papers (2021-05-25T15:55:14Z) - Motion Vector Extrapolation for Video Object Detection [0.0]
MOVEX enables low latency video object detection on common CPU based systems.
We show that our approach significantly reduces the baseline latency of any given object detector.
Further latency reduction, up to 25x lower than the original latency, can be achieved with minimal accuracy loss.
arXiv Detail & Related papers (2021-04-18T17:26:37Z) - GPU-Accelerated Primal Learning for Extremely Fast Large-Scale
Classification [10.66048003460524]
One of the most efficient methods to solve L2-regularized primal problems, such as logistic regression and linear support vector machine (SVM) classification, is the widely used trust region Newton algorithm, TRON.
We show that using judicious GPU-optimization principles, TRON training time for different losses and feature representations may be drastically reduced.
arXiv Detail & Related papers (2020-08-08T03:40:27Z) - Heterogeneous CPU+GPU Stochastic Gradient Descent Algorithms [1.3249453757295084]
We study training algorithms for deep learning on heterogeneous CPU+GPU architectures.
Our two-fold objective -- maximize convergence rate and resource utilization simultaneously -- makes the problem challenging.
We show that the implementation of these algorithms achieves both faster convergence and higher resource utilization than on several real datasets.
arXiv Detail & Related papers (2020-04-19T05:21:20Z) - MPLP++: Fast, Parallel Dual Block-Coordinate Ascent for Dense Graphical
Models [96.1052289276254]
This work introduces a new MAP-solver, based on the popular Dual Block-Coordinate Ascent principle.
Surprisingly, by making a small change to the low-performing solver, we derive the new solver MPLP++ that significantly outperforms all existing solvers by a large margin.
arXiv Detail & Related papers (2020-04-16T16:20:53Z) - Efficient Video Semantic Segmentation with Labels Propagation and
Refinement [138.55845680523908]
This paper tackles the problem of real-time semantic segmentation of high definition videos using a hybrid GPU / CPU approach.
We propose an Efficient Video(EVS) pipeline that combines: (i) On the CPU, a very fast optical flow method, that is used to exploit the temporal aspect of the video and propagate semantic information from one frame to the next.
On the popular Cityscapes dataset with high resolution frames (2048 x 1024), the proposed operating points range from 80 to 1000 Hz on a single GPU and CPU.
arXiv Detail & Related papers (2019-12-26T11:45:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.