Low Latency Visual Inertial Odometry with On-Sensor Accelerated Optical Flow for Resource-Constrained UAVs
- URL: http://arxiv.org/abs/2406.13345v1
- Date: Wed, 19 Jun 2024 08:51:19 GMT
- Title: Low Latency Visual Inertial Odometry with On-Sensor Accelerated Optical Flow for Resource-Constrained UAVs
- Authors: Jonas Kühne, Michele Magno, Luca Benini,
- Abstract summary: On-sensor hardware acceleration is a promising approach to enable low latency Visual Inertial Odometry (VIO)
This paper assesses the speed-up in a VIO sensor system exploiting a compact OF sensor consisting of a global shutter camera and an Application Specific Integrated Circuit (ASIC)
By replacing the feature tracking logic of the VINS-Mono pipeline with data from this OF camera, we demonstrate a 49.4% reduction in latency and a 53.7% reduction of compute load of the VIO pipeline over the original VINS-Mono implementation.
- Score: 13.037162115493393
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual Inertial Odometry (VIO) is the task of estimating the movement trajectory of an agent from an onboard camera stream fused with additional Inertial Measurement Unit (IMU) measurements. A crucial subtask within VIO is the tracking of features, which can be achieved through Optical Flow (OF). As the calculation of OF is a resource-demanding task in terms of computational load and memory footprint, which needs to be executed at low latency, especially in robotic applications, OF estimation is today performed on powerful CPUs or GPUs. This restricts its use in a broad spectrum of applications where the deployment of such powerful, power-hungry processors is unfeasible due to constraints related to cost, size, and power consumption. On-sensor hardware acceleration is a promising approach to enable low latency VIO even on resource-constrained devices such as nano drones. This paper assesses the speed-up in a VIO sensor system exploiting a compact OF sensor consisting of a global shutter camera and an Application Specific Integrated Circuit (ASIC). By replacing the feature tracking logic of the VINS-Mono pipeline with data from this OF camera, we demonstrate a 49.4% reduction in latency and a 53.7% reduction of compute load of the VIO pipeline over the original VINS-Mono implementation, allowing VINS-Mono operation up to 50 FPS instead of 20 FPS on the quad-core ARM Cortex-A72 processor of a Raspberry Pi Compute Module 4.
Related papers
- On-Sensor Convolutional Neural Networks with Early-Exits [3.916521228619074]
We introduce for the first time in the literature the optimized design and implementation of Depth-First CNNs operating on the Intelligent Sensor Processing Unit (ISPU) within an Inertial Measurement Unit (IMU) by STMicroelectronics.
Our approach partitions the CNN between the ISPU and the microcontroller (MCU) and employs an Early-Exit mechanism to stop the computations on the IMU when enough confidence about the results is achieved.
arXiv Detail & Related papers (2025-03-21T08:31:07Z) - A Performance Analysis of You Only Look Once Models for Deployment on Constrained Computational Edge Devices in Drone Applications [0.0]
This study evaluates the deployment of object detection models on resource-constrained edge devices and cloud environments.
The NVIDIA Jetson Orin Nano, Orin NX, and Raspberry Pi 5 (RPI5) devices have been tested to measure their detection accuracy, inference speed, and energy consumption.
arXiv Detail & Related papers (2025-02-06T17:22:01Z) - SpiDR: A Reconfigurable Digital Compute-in-Memory Spiking Neural Network Accelerator for Event-based Perception [8.968583287058959]
Spiking Neural Networks (SNNs) offer an efficient method for processing the asynchronous temporal data generated by Dynamic Vision Sensors (DVS)
Existing SNN accelerators suffer from limitations in adaptability to diverse neuron models, bit precisions and network sizes.
We propose a scalable and reconfigurable digital compute-in-memory (CIM) SNN accelerator chipname with a set of key features.
arXiv Detail & Related papers (2024-11-05T06:59:02Z) - SATAY: A Streaming Architecture Toolflow for Accelerating YOLO Models on
FPGA Devices [48.47320494918925]
This work tackles the challenges of deploying stateof-the-art object detection models onto FPGA devices for ultralow latency applications.
We employ a streaming architecture design for our YOLO accelerators, implementing the complete model on-chip in a deeply pipelined fashion.
We introduce novel hardware components to support the operations of YOLO models in a dataflow manner, and off-chip memory buffering to address the limited on-chip memory resources.
arXiv Detail & Related papers (2023-09-04T13:15:01Z) - Speck: A Smart event-based Vision Sensor with a low latency 327K Neuron Convolutional Neuronal Network Processing Pipeline [5.8859061623552975]
We present a smart vision sensor System on Chip (SoC), featuring an event-based camera and a low-power asynchronous spiking Convolutional Neural Network (sCNN) computing architecture embedded on a single chip.
By combining both sensor and processing on a single die, we can lower unit production costs significantly.
We present the asynchronous architecture, the individual blocks, and the sCNN processing principle and benchmark against other sCNN capable processors.
arXiv Detail & Related papers (2023-04-13T19:28:57Z) - Optical flow estimation from event-based cameras and spiking neural
networks [0.4899818550820575]
Event-based sensors are an excellent fit for Spiking Neural Networks (SNNs)
We propose a U-Net-like SNN which, after supervised training, is able to make dense optical flow estimations.
Thanks to separable convolutions, we have been able to develop a light model that can nonetheless yield reasonably accurate optical flow estimates.
arXiv Detail & Related papers (2023-02-13T16:17:54Z) - FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task.
The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources.
It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - hARMS: A Hardware Acceleration Architecture for Real-Time Event-Based
Optical Flow [0.0]
Event-based vision sensors produce asynchronous event streams with high temporal resolution based on changes in the visual scene.
Existing solutions for calculating optical flow from event data fail to capture the true direction of motion due to the aperture problem.
We present a hardware realization of the fARMS algorithm allowing for real-time computation of true flow on low-power, embedded platforms.
arXiv Detail & Related papers (2021-12-13T16:27:17Z) - FastFlowNet: A Lightweight Network for Fast Optical Flow Estimation [81.76975488010213]
Dense optical flow estimation plays a key role in many robotic vision tasks.
Current networks often occupy large number of parameters and require heavy computation costs.
Our proposed FastFlowNet works in the well-known coarse-to-fine manner with following innovations.
arXiv Detail & Related papers (2021-03-08T03:09:37Z) - EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware
Multi-Task NLP Inference [82.1584439276834]
Transformer-based language models such as BERT provide significant accuracy improvement for a multitude of natural language processing (NLP) tasks.
We present EdgeBERT, an in-depth algorithm- hardware co-design for latency-aware energy optimization for multi-task NLP.
arXiv Detail & Related papers (2020-11-28T19:21:47Z) - EBBINNOT: A Hardware Efficient Hybrid Event-Frame Tracker for Stationary
Dynamic Vision Sensors [5.674895233111088]
This paper presents a hybrid event-frame approach for detecting and tracking objects recorded by a stationary neuromorphic sensor.
To exploit the background removal property of a static DVS, we propose an event-based binary image creation that signals presence or absence of events in a frame duration.
This is the first time a stationary DVS based traffic monitoring solution is extensively compared to simultaneously recorded RGB frame-based methods.
arXiv Detail & Related papers (2020-05-31T03:01:35Z) - Near-chip Dynamic Vision Filtering for Low-Bandwidth Pedestrian
Detection [99.94079901071163]
This paper presents a novel end-to-end system for pedestrian detection using Dynamic Vision Sensors (DVSs)
We target applications where multiple sensors transmit data to a local processing unit, which executes a detection algorithm.
Our detector is able to perform a detection every 450 ms, with an overall testing F1 score of 83%.
arXiv Detail & Related papers (2020-04-03T17:36:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.