Rethinking Efficient and Effective Point-based Networks for Event Camera Classification and Regression: EventMamba
- URL: http://arxiv.org/abs/2405.06116v4
- Date: Fri, 28 Mar 2025 14:25:05 GMT
- Title: Rethinking Efficient and Effective Point-based Networks for Event Camera Classification and Regression: EventMamba
- Authors: Hongwei Ren, Yue Zhou, Jiadong Zhu, Haotian Fu, Yulong Huang, Xiaopeng Lin, Yuetong Fang, Fei Ma, Hao Yu, Bojun Cheng,
- Abstract summary: Event cameras draw inspiration from biological systems, boasting low latency and high dynamic range while consuming minimal power.<n>Most current approach to processing Event Cloud often involves converting it into frame-based representations.<n>We propose EventMamba, an efficient and effective framework based on Point Cloud representation.
- Score: 11.400397931501338
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Event cameras draw inspiration from biological systems, boasting low latency and high dynamic range while consuming minimal power. The most current approach to processing Event Cloud often involves converting it into frame-based representations, which neglects the sparsity of events, loses fine-grained temporal information, and increases the computational burden. In contrast, Point Cloud is a popular representation for processing 3-dimensional data and serves as an alternative method to exploit local and global spatial features. Nevertheless, previous point-based methods show an unsatisfactory performance compared to the frame-based method in dealing with spatio-temporal event streams. In order to bridge the gap, we propose EventMamba, an efficient and effective framework based on Point Cloud representation by rethinking the distinction between Event Cloud and Point Cloud, emphasizing vital temporal information. The Event Cloud is subsequently fed into a hierarchical structure with staged modules to process both implicit and explicit temporal features. Specifically, we redesign the global extractor to enhance explicit temporal extraction among a long sequence of events with temporal aggregation and State Space Model (SSM) based Mamba. Our model consumes minimal computational resources in the experiments and still exhibits SOTA point-based performance on six different scales of action recognition datasets. It even outperformed all frame-based methods on both Camera Pose Relocalization (CPR) and eye-tracking regression tasks. Our code is available at: https://github.com/rhwxmx/EventMamba.
Related papers
- EventMamba: Enhancing Spatio-Temporal Locality with State Space Models for Event-Based Video Reconstruction [66.84997711357101]
EventMamba is a specialized model designed for event-based video reconstruction tasks.
We show that EventMamba markedly improves speed while delivering superior visual quality compared to Transformer-based methods.
arXiv Detail & Related papers (2025-03-25T14:46:45Z) - Frequency-aware Event Cloud Network [22.41905416371072]
We propose a frequency-aware network named FECNet that leverages Event Cloud representations.
FECNet fully utilizes 2S-1T-1P Event Cloud by innovating the event-based Group and Sampling module.
We conducted extensive experiments on event-based object classification, action recognition, and human pose estimation tasks.
arXiv Detail & Related papers (2024-12-30T08:53:57Z) - Event-based Motion Deblurring via Multi-Temporal Granularity Fusion [5.58706910566768]
Event camera, a bio-inspired sensor offering continuous visual information could enhance the deblurring performance.
Existing event-based image deblurring methods usually utilize voxel-based event representations.
We introduce point cloud-based event representation into the image deblurring task and propose a Multi-Temporal Granularity Network (MTGNet)
It combines the spatially dense but temporally coarse-grained voxel-based event representation and the temporally fine-grained but spatially sparse point cloud-based event.
arXiv Detail & Related papers (2024-12-16T15:20:54Z) - Event-Stream Super Resolution using Sigma-Delta Neural Network [0.10923877073891444]
Event cameras present unique challenges due to their low resolution and sparse, asynchronous nature of the data they collect.
Current event super-resolution algorithms are not fully optimized for the distinct data structure produced by event cameras.
Research proposes a method that integrates binary spikes with Sigma Delta Neural Networks (SDNNs)
arXiv Detail & Related papers (2024-08-13T15:25:18Z) - Fast Window-Based Event Denoising with Spatiotemporal Correlation
Enhancement [85.66867277156089]
We propose window-based event denoising, which simultaneously deals with a stack of events.
In spatial domain, we choose maximum a posteriori (MAP) to discriminate real-world event and noise.
Our algorithm can remove event noise effectively and efficiently and improve the performance of downstream tasks.
arXiv Detail & Related papers (2024-02-14T15:56:42Z) - Representation Learning on Event Stream via an Elastic Net-incorporated
Tensor Network [1.9515859963221267]
We present a novel representation method which can capture global correlations of all events in the event stream simultaneously.
Our method can achieve effective results in applications like filtering noise compared with the state-of-the-art methods.
arXiv Detail & Related papers (2024-01-16T02:51:47Z) - Point Cloud Pre-training with Diffusion Models [62.12279263217138]
We propose a novel pre-training method called Point cloud Diffusion pre-training (PointDif)
PointDif achieves substantial improvement across various real-world datasets for diverse downstream tasks such as classification, segmentation and detection.
arXiv Detail & Related papers (2023-11-25T08:10:05Z) - Implicit Event-RGBD Neural SLAM [54.74363487009845]
Implicit neural SLAM has achieved remarkable progress recently.
Existing methods face significant challenges in non-ideal scenarios.
We propose EN-SLAM, the first event-RGBD implicit neural SLAM framework.
arXiv Detail & Related papers (2023-11-18T08:48:58Z) - Graph-based Asynchronous Event Processing for Rapid Object Recognition [59.112755601918074]
Event cameras capture asynchronous events stream in which each event encodes pixel location, trigger time, and the polarity of the brightness changes.
We introduce a novel graph-based framework for event cameras, namely SlideGCN.
Our approach can efficiently process data event-by-event, unlock the low latency nature of events data while still maintaining the graph's structure internally.
arXiv Detail & Related papers (2023-08-28T08:59:57Z) - Generalizing Event-Based Motion Deblurring in Real-World Scenarios [62.995994797897424]
Event-based motion deblurring has shown promising results by exploiting low-latency events.
We propose a scale-aware network that allows flexible input spatial scales and enables learning from different temporal scales of motion blur.
A two-stage self-supervised learning scheme is then developed to fit real-world data distribution.
arXiv Detail & Related papers (2023-08-11T04:27:29Z) - Hierarchical Neural Memory Network for Low Latency Event Processing [35.34966621111271]
This paper proposes a low latency neural network architecture for event-based dense prediction tasks.
We achieve this by constructing temporal hierarchy using stacked latent memories that operate at different rates.
We conduct extensive evaluations on three event-based dense prediction tasks, where the proposed approach outperforms the existing methods on accuracy and latency.
arXiv Detail & Related papers (2023-05-29T02:29:16Z) - HALSIE: Hybrid Approach to Learning Segmentation by Simultaneously
Exploiting Image and Event Modalities [6.543272301133159]
Event cameras detect changes in per-pixel intensity to generate asynchronous event streams.
They offer great potential for accurate semantic map retrieval in real-time autonomous systems.
Existing implementations for event segmentation suffer from sub-based performance.
We propose hybrid end-to-end learning framework HALSIE to reduce inference cost by up to $20times$ versus art.
arXiv Detail & Related papers (2022-11-19T17:09:50Z) - CloudAttention: Efficient Multi-Scale Attention Scheme For 3D Point
Cloud Learning [81.85951026033787]
We set transformers in this work and incorporate them into a hierarchical framework for shape classification and part and scene segmentation.
We also compute efficient and dynamic global cross attentions by leveraging sampling and grouping at each iteration.
The proposed hierarchical model achieves state-of-the-art shape classification in mean accuracy and yields results on par with the previous segmentation methods.
arXiv Detail & Related papers (2022-07-31T21:39:15Z) - Asynchronous Optimisation for Event-based Visual Odometry [53.59879499700895]
Event cameras open up new possibilities for robotic perception due to their low latency and high dynamic range.
We focus on event-based visual odometry (VO)
We propose an asynchronous structure-from-motion optimisation back-end.
arXiv Detail & Related papers (2022-03-02T11:28:47Z) - OctAttention: Octree-based Large-scale Contexts Model for Point Cloud
Compression [36.77271904751208]
OctAttention employs the octree structure, a memory-efficient representation for point clouds.
Our approach saves 95% coding time compared to the voxel-based baseline.
Compared to the previous state-of-the-art works, our approach obtains a 10%-35% BD-Rate gain on the LiDAR benchmark.
arXiv Detail & Related papers (2022-02-12T10:06:12Z) - Learning Semantic Segmentation of Large-Scale Point Clouds with Random
Sampling [52.464516118826765]
We introduce RandLA-Net, an efficient and lightweight neural architecture to infer per-point semantics for large-scale point clouds.
The key to our approach is to use random point sampling instead of more complex point selection approaches.
Our RandLA-Net can process 1 million points in a single pass up to 200x faster than existing approaches.
arXiv Detail & Related papers (2021-07-06T05:08:34Z) - AET-EFN: A Versatile Design for Static and Dynamic Event-Based Vision [33.4444564715323]
Event data are noisy, sparse, and nonuniform in the spatial-temporal domain with an extremely high temporal resolution.
Existing methods encode events into point-cloud-based or voxel-based representations, but suffer from noise and/or information loss.
This work proposes the Aligned Event Frame (AET) as a novel event data representation, and a neat framework called Event Frame Net (EFN)
The proposed AET and EFN are evaluated on various datasets, and proved to surpass existing state-of-the-art methods by large margins.
arXiv Detail & Related papers (2021-03-22T08:09:03Z) - Unsupervised Feature Learning for Event Data: Direct vs Inverse Problem
Formulation [53.850686395708905]
Event-based cameras record an asynchronous stream of per-pixel brightness changes.
In this paper, we focus on single-layer architectures for representation learning from event data.
We show improvements of up to 9 % in the recognition accuracy compared to the state-of-the-art methods.
arXiv Detail & Related papers (2020-09-23T10:40:03Z) - Event-based Asynchronous Sparse Convolutional Networks [54.094244806123235]
Event cameras are bio-inspired sensors that respond to per-pixel brightness changes in the form of asynchronous and sparse "events"
We present a general framework for converting models trained on synchronous image-like event representations into asynchronous models with identical output.
We show both theoretically and experimentally that this drastically reduces the computational complexity and latency of high-capacity, synchronous neural networks.
arXiv Detail & Related papers (2020-03-20T08:39:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.