Related papers: Self-Supervised Event Representations: Towards Accurate, Real-Time Perception on SoC FPGAs

Self-Supervised Event Representations: Towards Accurate, Real-Time Perception on SoC FPGAs

URL: http://arxiv.org/abs/2505.07556v1
Date: Mon, 12 May 2025 13:32:08 GMT
Title: Self-Supervised Event Representations: Towards Accurate, Real-Time Perception on SoC FPGAs
Authors: Kamil Jeziorek, Tomasz Kryjak,
Abstract summary: Event cameras offer significant advantages over traditional frame-based sensors.<n>The effective processing of their sparse, asynchronous event streams remains challenging.<n>This paper introduces a novel Self-Supervised Event Representation (SSER) method.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Event cameras offer significant advantages over traditional frame-based sensors. These include microsecond temporal resolution, robustness under varying lighting conditions and low power consumption. Nevertheless, the effective processing of their sparse, asynchronous event streams remains challenging. Existing approaches to this problem can be categorised into two distinct groups. The first group involves the direct processing of event data with neural models, such as Spiking Neural Networks or Graph Convolutional Neural Networks. However, this approach is often accompanied by a compromise in terms of qualitative performance. The second group involves the conversion of events into dense representations with handcrafted aggregation functions, which can boost accuracy at the cost of temporal fidelity. This paper introduces a novel Self-Supervised Event Representation (SSER) method leveraging Gated Recurrent Unit (GRU) networks to achieve precise per-pixel encoding of event timestamps and polarities without temporal discretisation. The recurrent layers are trained in a self-supervised manner to maximise the fidelity of event-time encoding. The inference is performed with event representations generated asynchronously, thus ensuring compatibility with high-throughput sensors. The experimental validation demonstrates that SSER outperforms aggregation-based baselines, achieving improvements of 2.4% mAP and 0.6% on the Gen1 and 1 Mpx object detection datasets. Furthermore, the paper presents the first hardware implementation of recurrent representation for event data on a System-on-Chip FPGA, achieving sub-microsecond latency and power consumption between 1-2 W, suitable for real-time, power-efficient applications. Code is available at https://github.com/vision-agh/RecRepEvent.

Related papers

Event Signal Filtering via Probability Flux Estimation [58.31652473933809]
Events offer a novel paradigm for capturing scene dynamics via asynchronous sensing, but their inherent randomness often leads to degraded signal quality.<n>Event signal filtering is thus essential for enhancing fidelity by reducing this internal randomness and ensuring consistent outputs across diverse acquisition conditions.<n>This paper introduces a generative, online filtering framework called Event Density Flow Filter (EDFilter)<n>Experiments validate EDFilter's performance across tasks like event filtering, super-resolution, and direct event-based blob tracking.
arXiv Detail & Related papers (2025-04-10T07:03:08Z)
FUSE: Label-Free Image-Event Joint Monocular Depth Estimation via Frequency-Decoupled Alignment and Degradation-Robust Fusion [63.87313550399871]
Image-event joint depth estimation methods leverage complementary modalities for robust perception, yet face challenges in generalizability.<n>We propose Self-supervised Transfer (PST) and FrequencyDe-coupled Fusion module (FreDF)<n>PST establishes cross-modal knowledge transfer through latent space alignment with image foundation models.<n>FreDF explicitly decouples high-frequency edge features from low-frequency structural components, resolving modality-specific frequency mismatches.
arXiv Detail & Related papers (2025-03-25T15:04:53Z)
Event-Stream Super Resolution using Sigma-Delta Neural Network [0.10923877073891444]
Event cameras present unique challenges due to their low resolution and sparse, asynchronous nature of the data they collect. Current event super-resolution algorithms are not fully optimized for the distinct data structure produced by event cameras. Research proposes a method that integrates binary spikes with Sigma Delta Neural Networks (SDNNs)
arXiv Detail & Related papers (2024-08-13T15:25:18Z)
Embedded Graph Convolutional Networks for Real-Time Event Data Processing on SoC FPGAs [0.815557531820863]
We introduce a custom EFGCN (Event-based FPGA-accelerated Graph Convolutional Network) designed with a series of hardware-aware optimisations tailored for PointNetConv.<n>The proposed techniques result in up to 100-fold reduction in model size compared to Asynchronous Event-based GNN (AEGNN)<n>Our approach achieves state-of-the-art performance across multiple event-based classification benchmarks while remaining highly scalable, customisable and resource-efficient.
arXiv Detail & Related papers (2024-06-11T14:47:36Z)
MambaPupil: Bidirectional Selective Recurrent model for Event-based Eye tracking [50.26836546224782]
Event-based eye tracking has shown great promise with the high temporal resolution and low redundancy. The diversity and abruptness of eye movement patterns, including blinking, fixating, saccades, and smooth pursuit, pose significant challenges for eye localization. This paper proposes a bidirectional long-term sequence modeling and time-varying state selection mechanism to fully utilize contextual temporal information.
arXiv Detail & Related papers (2024-04-18T11:09:25Z)
Dual Memory Aggregation Network for Event-Based Object Detection with Learnable Representation [79.02808071245634]
Event-based cameras are bio-inspired sensors that capture brightness change of every pixel in an asynchronous manner. Event streams are divided into grids in the x-y-t coordinates for both positive and negative polarity, producing a set of pillars as 3D tensor representation. Long memory is encoded in the hidden state of adaptive convLSTMs while short memory is modeled by computing spatial-temporal correlation between event pillars.
arXiv Detail & Related papers (2023-03-17T12:12:41Z)
HALSIE: Hybrid Approach to Learning Segmentation by Simultaneously Exploiting Image and Event Modalities [6.543272301133159]
Event cameras detect changes in per-pixel intensity to generate asynchronous event streams. They offer great potential for accurate semantic map retrieval in real-time autonomous systems. Existing implementations for event segmentation suffer from sub-based performance. We propose hybrid end-to-end learning framework HALSIE to reduce inference cost by up to $20times$ versus art.
arXiv Detail & Related papers (2022-11-19T17:09:50Z)
AEGNN: Asynchronous Event-based Graph Neural Networks [54.528926463775946]
Event-based Graph Neural Networks generalize standard GNNs to process events as "evolving"-temporal graphs. AEGNNs are easily trained on synchronous inputs and can be converted to efficient, "asynchronous" networks at test time.
arXiv Detail & Related papers (2022-03-31T16:21:12Z)
Event-LSTM: An Unsupervised and Asynchronous Learning-based Representation for Event-based Data [8.931153235278831]
Event cameras are activity-driven bio-inspired vision sensors. We propose Event-LSTM, an unsupervised Auto-Encoder architecture made up of LSTM layers. We also push state-of-the-art event de-noising forward by introducing memory into the de-noising process.
arXiv Detail & Related papers (2021-05-10T09:18:52Z)
Event-based Asynchronous Sparse Convolutional Networks [54.094244806123235]
Event cameras are bio-inspired sensors that respond to per-pixel brightness changes in the form of asynchronous and sparse "events" We present a general framework for converting models trained on synchronous image-like event representations into asynchronous models with identical output. We show both theoretically and experimentally that this drastically reduces the computational complexity and latency of high-capacity, synchronous neural networks.
arXiv Detail & Related papers (2020-03-20T08:39:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.