Towards Low-Latency Event-based Obstacle Avoidance on a FPGA-Drone
- URL: http://arxiv.org/abs/2504.10400v1
- Date: Mon, 14 Apr 2025 16:51:10 GMT
- Title: Towards Low-Latency Event-based Obstacle Avoidance on a FPGA-Drone
- Authors: Pietro Bonazzi, Christian Vogt, Michael Jost, Lyes Khacef, Federico Paredes-Vallés, Michele Magno,
- Abstract summary: This work quantitatively evaluates the performance of event-based vision systems (EVS) against conventional RGB-based models for action prediction in collision avoidance on an FPGA accelerator.<n>The EVS model achieves a significantly higher effective frame rate (1 kHz) and lower temporal (-20 ms) and spatial prediction errors (-20 mm) compared to the RGB-based model.<n>These results underscore the advantages of event-based vision for real-time collision avoidance and demonstrate its potential for deployment in resource-constrained environments.
- Score: 6.515830463304737
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This work quantitatively evaluates the performance of event-based vision systems (EVS) against conventional RGB-based models for action prediction in collision avoidance on an FPGA accelerator. Our experiments demonstrate that the EVS model achieves a significantly higher effective frame rate (1 kHz) and lower temporal (-20 ms) and spatial prediction errors (-20 mm) compared to the RGB-based model, particularly when tested on out-of-distribution data. The EVS model also exhibits superior robustness in selecting optimal evasion maneuvers. In particular, in distinguishing between movement and stationary states, it achieves a 59 percentage point advantage in precision (78% vs. 19%) and a substantially higher F1 score (0.73 vs. 0.06), highlighting the susceptibility of the RGB model to overfitting. Further analysis in different combinations of spatial classes confirms the consistent performance of the EVS model in both test data sets. Finally, we evaluated the system end-to-end and achieved a latency of approximately 2.14 ms, with event aggregation (1 ms) and inference on the processing unit (0.94 ms) accounting for the largest components. These results underscore the advantages of event-based vision for real-time collision avoidance and demonstrate its potential for deployment in resource-constrained environments.
Related papers
- HGO-YOLO: Advancing Anomaly Behavior Detection with Hierarchical Features and Lightweight Optimized Detection [0.0]
This study proposes a model called HGO-YOLO, which integrates the HGNetv2 architecture into YOLOv8.<n> Evaluation results show that the proposed algorithm achieves a mAP@0.5 of 87.4% and a recall rate of 81.1%, with a model size of only 4.6 MB and a frame rate of 56 FPS on the CPU.
arXiv Detail & Related papers (2025-03-10T14:29:12Z) - YOLOv12: A Breakdown of the Key Architectural Features [0.5639904484784127]
YOLOv12 is a significant advancement in single-stage, real-time object detection.<n>It incorporates an optimised backbone (R-ELAN), 7x7 separable convolutions, and FlashAttention-driven area-based attention.<n>It offers scalable solutions for both latency-sensitive and high-accuracy applications.
arXiv Detail & Related papers (2025-02-20T17:08:43Z) - Fall Detection for Industrial Setups Using YOLOv8 Variants [0.0]
The YOLOv8m model, consisting of 25.9 million parameters and 79.1 GFLOPs, demonstrated a respectable balance between computational efficiency and detection performance.
Although the YOLOv8l and YOLOv8x models presented higher precision and recall, their higher computational demands and model size make them less suitable for resource-constrained environments.
arXiv Detail & Related papers (2024-08-08T17:24:54Z) - Sub-token ViT Embedding via Stochastic Resonance Transformers [51.12001699637727]
Vision Transformer (ViT) architectures represent images as collections of high-dimensional vectorized tokens, each corresponding to a rectangular non-overlapping patch.
We propose a training-free method inspired by "stochastic resonance"
The resulting "Stochastic Resonance Transformer" (SRT) retains the rich semantic information of the original representation, but grounds it on a finer-scale spatial domain, partly mitigating the coarse effect of spatial tokenization.
arXiv Detail & Related papers (2023-10-06T01:53:27Z) - Sample-Efficient Optimisation with Probabilistic Transformer Surrogates [66.98962321504085]
This paper investigates the feasibility of employing state-of-the-art probabilistic transformers in Bayesian optimisation.
We observe two drawbacks stemming from their training procedure and loss definition, hindering their direct deployment as proxies in black-box optimisation.
We introduce two components: 1) a BO-tailored training prior supporting non-uniformly distributed points, and 2) a novel approximate posterior regulariser trading-off accuracy and input sensitivity to filter favourable stationary points for improved predictive performance.
arXiv Detail & Related papers (2022-05-27T11:13:17Z) - From Environmental Sound Representation to Robustness of 2D CNN Models
Against Adversarial Attacks [82.21746840893658]
This paper investigates the impact of different standard environmental sound representations (spectrograms) on the recognition performance and adversarial attack robustness of a victim residual convolutional neural network.
We show that while the ResNet-18 model trained on DWT spectrograms achieves a high recognition accuracy, attacking this model is relatively more costly for the adversary.
arXiv Detail & Related papers (2022-04-14T15:14:08Z) - ZARTS: On Zero-order Optimization for Neural Architecture Search [94.41017048659664]
Differentiable architecture search (DARTS) has been a popular one-shot paradigm for NAS due to its high efficiency.
This work turns to zero-order optimization and proposes a novel NAS scheme, called ZARTS, to search without enforcing the above approximation.
In particular, results on 12 benchmarks verify the outstanding robustness of ZARTS, where the performance of DARTS collapses due to its known instability issue.
arXiv Detail & Related papers (2021-10-10T09:35:15Z) - How to Train Your Flare Prediction Model: Revisiting Robust Sampling of
Rare Events [0.9851812512860351]
We present a case study of solar flare forecasting by means of metadata feature time series, by treating it as a prominent class-imbalance and temporally coherent problem.
We showcase the general concept of temporal coherence triggered by the demand of continuity in time series forecasting and show that lack of proper understanding of this effect may spuriously enhance models' performance.
We revisit the main remedies for these challenges and present several experiments to illustrate the exact impact that each of these remedies may have on performance.
arXiv Detail & Related papers (2021-03-12T21:37:08Z) - From Sound Representation to Model Robustness [82.21746840893658]
We investigate the impact of different standard environmental sound representations (spectrograms) on the recognition performance and adversarial attack robustness of a victim residual convolutional neural network.
Averaged over various experiments on three environmental sound datasets, we found the ResNet-18 model outperforms other deep learning architectures.
arXiv Detail & Related papers (2020-07-27T17:30:49Z) - Towards a Competitive End-to-End Speech Recognition for CHiME-6 Dinner
Party Transcription [73.66530509749305]
In this paper, we argue that, even in difficult cases, some end-to-end approaches show performance close to the hybrid baseline.
We experimentally compare and analyze CTC-Attention versus RNN-Transducer approaches along with RNN versus Transformer architectures.
Our best end-to-end model based on RNN-Transducer, together with improved beam search, reaches quality by only 3.8% WER abs. worse than the LF-MMI TDNN-F CHiME-6 Challenge baseline.
arXiv Detail & Related papers (2020-04-22T19:08:33Z) - GraphTCN: Spatio-Temporal Interaction Modeling for Human Trajectory
Prediction [5.346782918364054]
We propose a novel CNN-based spatial-temporal graph framework GraphCNT to support more efficient and accurate trajectory predictions.
In contrast to conventional models, both the spatial and temporal modeling of our model are computed within each local time window.
Our model achieves better performance in terms of both efficiency and accuracy as compared with state-of-the-art models on various trajectory prediction benchmark datasets.
arXiv Detail & Related papers (2020-03-16T12:56:12Z) - Highly Efficient Salient Object Detection with 100K Parameters [137.74898755102387]
We propose a flexible convolutional module, namely generalized OctConv (gOctConv), to efficiently utilize both in-stage and cross-stages multi-scale features.
We build an extremely light-weighted model, namely CSNet, which achieves comparable performance with about 0.2% (100k) of large models on popular object detection benchmarks.
arXiv Detail & Related papers (2020-03-12T07:00:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.