Related papers: Hardware-Aware Feature Extraction Quantisation for Real-Time Visual Odometry on FPGA Platforms

Hardware-Aware Feature Extraction Quantisation for Real-Time Visual Odometry on FPGA Platforms

URL: http://arxiv.org/abs/2507.07903v1
Date: Thu, 10 Jul 2025 16:37:20 GMT
Title: Hardware-Aware Feature Extraction Quantisation for Real-Time Visual Odometry on FPGA Platforms
Authors: Mateusz Wasala, Mateusz Smolarczyk, Michal Danilowicz, Tomasz Kryjak,
Abstract summary: We propose an embedded implementation of an unsupervised architecture capable of detecting and describing feature points.<n>We implemented the solution on an FPGA System-on-Chip (SoC) platform, specifically the AMD/Xilinx Zynq UltraScale+.<n>This allowed us to process 640 x 480 pixel images at up to 54 fps, outperforming state-of-the-art solutions in the field.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Accurate position estimation is essential for modern navigation systems deployed in autonomous platforms, including ground vehicles, marine vessels, and aerial drones. In this context, Visual Simultaneous Localisation and Mapping (VSLAM) - which includes Visual Odometry - relies heavily on the reliable extraction of salient feature points from the visual input data. In this work, we propose an embedded implementation of an unsupervised architecture capable of detecting and describing feature points. It is based on a quantised SuperPoint convolutional neural network. Our objective is to minimise the computational demands of the model while preserving high detection quality, thus facilitating efficient deployment on platforms with limited resources, such as mobile or embedded systems. We implemented the solution on an FPGA System-on-Chip (SoC) platform, specifically the AMD/Xilinx Zynq UltraScale+, where we evaluated the performance of Deep Learning Processing Units (DPUs) and we also used the Brevitas library and the FINN framework to perform model quantisation and hardware-aware optimisation. This allowed us to process 640 x 480 pixel images at up to 54 fps on an FPGA platform, outperforming state-of-the-art solutions in the field. We conducted experiments on the TUM dataset to demonstrate and discuss the impact of different quantisation techniques on the accuracy and performance of the model in a visual odometry task.

Related papers

LEVIO: Lightweight Embedded Visual Inertial Odometry for Resource-Constrained Devices [18.91672527573445]
This work presents LEVIO, a fully featured VIO pipeline optimized for ultra-low-power compute platforms.<n>LEVIO incorporates established VIO components such as Oriented FAST and Rotated BRIEF (ORB) feature tracking and bundle adjustment.<n>The paper proposes and details the algorithmic design choices and the hardware-software co-optimization approach, and presents real-time performance on resource-constrained hardware.
arXiv Detail & Related papers (2026-02-03T09:20:57Z)
Few-Shot Video Object Segmentation in X-Ray Angiography Using Local Matching and Spatio-Temporal Consistency Loss [13.850743997507488]
We introduce a novel FSVOS model that employs a local matching strategy to restrict the search space to the most neighboring pixels.<n>Specifically, we implement non-parametric sampling mechanism that enables dynamically varying sampling regions.<n>This work offers enhanced potential for a wide range of clinical applications.
arXiv Detail & Related papers (2026-01-02T21:26:28Z)
SlimEdge: Lightweight Distributed DNN Deployment on Constrained Hardware [0.6219985442687116]
Deep distributed networks (DNNs) have become central to modern computer vision.<n>Our method integrates a structured model pruning with a multi-objective optimization to tailor network capacity to heterogeneous device constraints.<n>Results show that the resulting models satisfy user-specified bounds on accuracy and memory footprint while reducing inference latency by factors ranging from 1.2x to 5.0x.
arXiv Detail & Related papers (2025-12-11T04:02:21Z)
QuartDepth: Post-Training Quantization for Real-Time Depth Estimation on the Edge [55.75103034526652]
We propose QuartDepth which adopts post-training quantization to quantize MDE models with hardware accelerations for ASICs.<n>Our approach involves quantizing both weights and activations to 4-bit precision, reducing the model size and computation cost.<n>We design a flexible and programmable hardware accelerator by supporting kernel fusion and customized instruction programmability.
arXiv Detail & Related papers (2025-03-20T21:03:10Z)
Real-Time Semantic Segmentation of Aerial Images Using an Embedded U-Net: A Comparison of CPU, GPU, and FPGA Workflows [0.0]
This study introduces a lightweight U-Net model optimized for real-time semantic segmentation of aerial images.<n>We maintain the accuracy of the U-Net on a real-world dataset while significantly reducing the model's parameters and Multiply-Accumulate (MAC) operations by a factor of 16.
arXiv Detail & Related papers (2025-03-07T08:33:28Z)
Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge. Existing methods struggle to balance high model performance with low resource consumption. We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z)
AdaLog: Post-Training Quantization for Vision Transformers with Adaptive Logarithm Quantizer [54.713778961605115]
Vision Transformer (ViT) has become one of the most prevailing fundamental backbone networks in the computer vision community. We propose a novel non-uniform quantizer, dubbed the Adaptive Logarithm AdaLog (AdaLog) quantizer.
arXiv Detail & Related papers (2024-07-17T18:38:48Z)
Benchmarking Deep Learning Models on NVIDIA Jetson Nano for Real-Time Systems: An Empirical Investigation [2.3636539018632616]
This work empirically investigates the optimization of complex deep learning models to analyze their functionality on an embedded device. It evaluates the effectiveness of the optimized models in terms of their inference speed for image classification and video action detection.
arXiv Detail & Related papers (2024-06-25T17:34:52Z)
Enhancing Dropout-based Bayesian Neural Networks with Multi-Exit on FPGA [20.629635991749808]
This paper proposes an algorithm and hardware co-design framework that can generate field-programmable gate array (FPGA)-based accelerators for efficient BayesNNs. At the algorithm level, we propose novel multi-exit dropout-based BayesNNs with reduced computational and memory overheads. At the hardware level, this paper introduces a transformation framework that can generate FPGA-based accelerators for the proposed efficient BayesNNs.
arXiv Detail & Related papers (2024-06-20T17:08:42Z)
Model-aware reinforcement learning for high-performance Bayesian experimental design in quantum metrology [0.4999814847776097]
Quantum sensors offer control flexibility during estimation by allowing manipulation by the experimenter across various parameters.<n>We introduce a versatile procedure capable of optimizing a wide range of problems in quantum metrology, estimation, and hypothesis testing.<n>We combine model-aware reinforcement learning (RL) with Bayesian estimation based on particle filtering.
arXiv Detail & Related papers (2023-12-28T12:04:15Z)
Deep Learning Computer Vision Algorithms for Real-time UAVs On-board Camera Image Processing [77.34726150561087]
This paper describes how advanced deep learning based computer vision algorithms are applied to enable real-time on-board sensor processing for small UAVs. All algorithms have been developed using state-of-the-art image processing methods based on deep neural networks.
arXiv Detail & Related papers (2022-11-02T11:10:42Z)
Open-source FPGA-ML codesign for the MLPerf Tiny Benchmark [11.575901540758574]
We present our development experience for the Tiny Inference Benchmark on field-programmable gate array (FPGA) platforms. We use the open-source hls4ml and FINN perJ, which aim to democratize AI- hardware codesign of optimized neural networks on FPGAs. The solutions are deployed on system-on-chip (Pynq-Z2) and pure FPGA (Arty A7-100T) platforms.
arXiv Detail & Related papers (2022-06-23T15:57:17Z)
Deep Learning for Real Time Satellite Pose Estimation on Low Power Edge TPU [58.720142291102135]
In this paper we propose a pose estimation software exploiting neural network architectures. We show how low power machine learning accelerators could enable Artificial Intelligence exploitation in space.
arXiv Detail & Related papers (2022-04-07T08:53:18Z)
PnP-DETR: Towards Efficient Visual Analysis with Transformers [146.55679348493587]
Recently, DETR pioneered the solution vision tasks with transformers, it directly translates the image feature map into the object result. Recent transformer-based image recognition model andTT show consistent efficiency gain.
arXiv Detail & Related papers (2021-09-15T01:10:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.