Uncertainty-aware Bridge based Mobile-Former Network for Event-based Pattern Recognition
- URL: http://arxiv.org/abs/2401.11123v2
- Date: Thu, 12 Sep 2024 07:01:55 GMT
- Title: Uncertainty-aware Bridge based Mobile-Former Network for Event-based Pattern Recognition
- Authors: Haoxiang Yang, Chengguo Yuan, Yabin Zhu, Lan Chen, Xiao Wang, Futian Wang,
- Abstract summary: Event cameras perform better on high dynamic range, no motion blur, and low energy consumption.
We propose a lightweight uncertainty-aware information propagation based Mobile-Former network for efficient pattern recognition.
- Score: 6.876867023375713
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The mainstream human activity recognition (HAR) algorithms are developed based on RGB cameras, which are easily influenced by low-quality images (e.g., low illumination, motion blur). Meanwhile, the privacy protection issue caused by ultra-high definition (HD) RGB cameras aroused more and more people's attention. Inspired by the success of event cameras which perform better on high dynamic range, no motion blur, and low energy consumption, we propose to recognize human actions based on the event stream. We propose a lightweight uncertainty-aware information propagation based Mobile-Former network for efficient pattern recognition, which aggregates the MobileNet and Transformer network effectively. Specifically, we first embed the event images using a stem network into feature representations, then, feed them into uncertainty-aware Mobile-Former blocks for local and global feature learning and fusion. Finally, the features from MobileNet and Transformer branches are concatenated for pattern recognition. Extensive experiments on multiple event-based recognition datasets fully validated the effectiveness of our model. The source code of this work will be released at https://github.com/Event-AHU/Uncertainty_aware_MobileFormer.
Related papers
- SpikMamba: When SNN meets Mamba in Event-based Human Action Recognition [13.426390494116776]
Human action recognition (HAR) plays a key role in various applications such as video analysis, surveillance, autonomous driving, robotics, and healthcare.
Most HAR algorithms are developed from RGB images, which capture detailed visual information.
Event cameras offer a promising solution by capturing scene brightness changes sparsely at the pixel level, without capturing full images.
arXiv Detail & Related papers (2024-10-22T07:00:43Z) - DemosaicFormer: Coarse-to-Fine Demosaicing Network for HybridEVS Camera [70.28702677370879]
Hybrid Event-Based Vision Sensor (HybridEVS) is a novel sensor integrating traditional frame-based and event-based sensors.
Despite its potential, the lack of Image signal processing (ISP) pipeline specifically designed for HybridEVS poses a significant challenge.
We propose a coarse-to-fine framework named DemosaicFormer which comprises coarse demosaicing and pixel correction.
arXiv Detail & Related papers (2024-06-12T07:20:46Z) - EventTransAct: A video transformer-based framework for Event-camera
based action recognition [52.537021302246664]
Event cameras offer new opportunities compared to standard action recognition in RGB videos.
In this study, we employ a computationally efficient model, namely the video transformer network (VTN), which initially acquires spatial embeddings per event-frame.
In order to better adopt the VTN for the sparse and fine-grained nature of event data, we design Event-Contrastive Loss ($mathcalL_EC$) and event-specific augmentations.
arXiv Detail & Related papers (2023-08-25T23:51:07Z) - SSTFormer: Bridging Spiking Neural Network and Memory Support
Transformer for Frame-Event based Recognition [42.118434116034194]
We propose to recognize patterns by fusing RGB frames and event streams simultaneously.
Due to the scarce of RGB-Event based classification dataset, we also propose a large-scale PokerEvent dataset.
arXiv Detail & Related papers (2023-08-08T16:15:35Z) - Dual Memory Aggregation Network for Event-Based Object Detection with
Learnable Representation [79.02808071245634]
Event-based cameras are bio-inspired sensors that capture brightness change of every pixel in an asynchronous manner.
Event streams are divided into grids in the x-y-t coordinates for both positive and negative polarity, producing a set of pillars as 3D tensor representation.
Long memory is encoded in the hidden state of adaptive convLSTMs while short memory is modeled by computing spatial-temporal correlation between event pillars.
arXiv Detail & Related papers (2023-03-17T12:12:41Z) - HARDVS: Revisiting Human Activity Recognition with Dynamic Vision
Sensors [40.949347728083474]
The main streams of human activity recognition (HAR) algorithms are developed based on RGB cameras which are suffered from illumination, fast motion, privacy-preserving, and large energy consumption.
Meanwhile, the biologically inspired event cameras attracted great interest due to their unique features, such as high dynamic range, dense temporal but sparse spatial resolution, low latency, low power, etc.
As it is a newly arising sensor, even there is no realistic large-scale dataset for HAR.
We propose a large-scale benchmark dataset, termed HARDVS, which contains 300 categories and more than 100K event sequences.
arXiv Detail & Related papers (2022-11-17T16:48:50Z) - Differentiable Frequency-based Disentanglement for Aerial Video Action
Recognition [56.91538445510214]
We present a learning algorithm for human activity recognition in videos.
Our approach is designed for UAV videos, which are mainly acquired from obliquely placed dynamic cameras.
We conduct extensive experiments on the UAV Human dataset and the NEC Drone dataset.
arXiv Detail & Related papers (2022-09-15T22:16:52Z) - MEFNet: Multi-scale Event Fusion Network for Motion Deblurring [62.60878284671317]
Traditional frame-based cameras inevitably suffer from motion blur due to long exposure times.
As a kind of bio-inspired camera, the event camera records the intensity changes in an asynchronous way with high temporal resolution.
In this paper, we rethink the event-based image deblurring problem and unfold it into an end-to-end two-stage image restoration network.
arXiv Detail & Related papers (2021-11-30T23:18:35Z) - Bridging the Gap between Events and Frames through Unsupervised Domain
Adaptation [57.22705137545853]
We propose a task transfer method that allows models to be trained directly with labeled images and unlabeled event data.
We leverage the generative event model to split event features into content and motion features.
Our approach unlocks the vast amount of existing image datasets for the training of event-based neural networks.
arXiv Detail & Related papers (2021-09-06T17:31:37Z) - AWNet: Attentive Wavelet Network for Image ISP [14.58067200317891]
We introduce a novel network that utilizes the attention mechanism and wavelet transform, dubbed AWNet, to tackle this learnable image ISP problem.
Our proposed method enables us to restore favorable image details from RAW information and achieve a larger receptive field.
Experimental results indicate the advances of our design in both qualitative and quantitative measurements.
arXiv Detail & Related papers (2020-08-20T23:28:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.