HoloEv-Net: Efficient Event-based Action Recognition via Holographic Spatial Embedding and Global Spectral Gating
- URL: http://arxiv.org/abs/2602.04182v1
- Date: Wed, 04 Feb 2026 03:42:42 GMT
- Title: HoloEv-Net: Efficient Event-based Action Recognition via Holographic Spatial Embedding and Global Spectral Gating
- Authors: Weidong Hao,
- Abstract summary: Event-based Action Recognition (EAR) has attracted significant attention due to high temporal resolution and high dynamic range of event cameras.<n>Existing methods typically suffer from (i) the computational redundancy of dense voxel representations, (ii) structural redundancy inherent in multi-branch architectures, and (iii) the under-utilization of spectral information in capturing global motion patterns.
- Score: 0.571097144710995
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Event-based Action Recognition (EAR) has attracted significant attention due to the high temporal resolution and high dynamic range of event cameras. However, existing methods typically suffer from (i) the computational redundancy of dense voxel representations, (ii) structural redundancy inherent in multi-branch architectures, and (iii) the under-utilization of spectral information in capturing global motion patterns. To address these challenges, we propose an efficient EAR framework named HoloEv-Net. First, to simultaneously tackle representation and structural redundancies, we introduce a Compact Holographic Spatiotemporal Representation (CHSR). Departing from computationally expensive voxel grids, CHSR implicitly embeds horizontal spatial cues into the Time-Height (T-H) view, effectively preserving 3D spatiotemporal contexts within a 2D representation. Second, to exploit the neglected spectral cues, we design a Global Spectral Gating (GSG) module. By leveraging the Fast Fourier Transform (FFT) for global token mixing in the frequency domain, GSG enhances the representation capability with negligible parameter overhead. Extensive experiments demonstrate the scalability and effectiveness of our framework. Specifically, HoloEv-Net-Base achieves state-of-the-art performance on THU-EACT-50-CHL, HARDVS and DailyDVS-200, outperforming existing methods by 10.29%, 1.71% and 6.25%, respectively. Furthermore, our lightweight variant, HoloEv-Net-Small, delivers highly competitive accuracy while offering extreme efficiency, reducing parameters by 5.4 times, FLOPs by 300times, and latency by 2.4times compared to heavy baselines, demonstrating its potential for edge deployment.
Related papers
- Spatio-Temporal Token Pruning for Efficient High-Resolution GUI Agents [10.559617160878227]
GUIPruner is a training-free framework tailored for high-resolution GUI navigation.<n>It synergizes Temporal-temporal Resolution (TAR) and Stratified Structure-aware Pruning (SSP)<n>It consistently achieves state-of-the-art performance, effectively preventing the collapse observed in large-scale models under high-resolution compression.
arXiv Detail & Related papers (2026-02-26T17:12:40Z) - ReGLA: Efficient Receptive-Field Modeling with Gated Linear Attention Network [14.912003445763688]
textbfReGLA integrates efficient convolutions for local feature extraction with ReLU-based gated linear attention for global modeling.<n>ReGLA outperforms similarly scaled iFormer models in downstream tasks, achieving gains of textbf3.1% AP on object detection and textbf3.6% mIoU on ADE20K semantic segmentation.
arXiv Detail & Related papers (2026-02-05T03:43:29Z) - EFSI-DETR: Efficient Frequency-Semantic Integration for Real-Time Small Object Detection in UAV Imagery [10.339425380819513]
EFSI-DETR is a novel detection framework that integrates efficient semantic feature enhancement with dynamic frequency-spatial guidance.<n>Experiments on VisDrone and CODrone benchmarks demonstrate that our EFSI-DETR achieves the state-of-the-art performance with real-time efficiency.
arXiv Detail & Related papers (2026-01-26T15:41:37Z) - SMV-EAR: Bring Spatiotemporal Multi-View Representation Learning into Efficient Event-Based Action Recognition [4.322175390073132]
Event action recognition (EAR) offers privacy-protecting and efficiency advantages where temporal motion dynamics is of great importance.<n>This paper reexamines the key SMVRL design stages for EAR and proposes a principled multi-view representation through translation-invariant dense conversion of sparse events.<n>We show that Top-1 accuracy gains over existing SMVRL EOR method with surprising 30.1% reduced parameters and 30.2% lower computations, establishing our framework as a novel and powerful EAR paradigm.
arXiv Detail & Related papers (2026-01-24T09:24:42Z) - Ultra3D: Efficient and High-Fidelity 3D Generation with Part Attention [54.15345846343084]
We propose Ultra3D, an efficient 3D generation framework that significantly accelerates sparse voxel modeling without compromising quality.<n>Part Attention is a geometry-aware localized attention mechanism that restricts attention computation within semantically consistent part regions.<n>Experiments demonstrate that Ultra3D supports high-resolution 3D generation at 1024 resolution and achieves state-of-the-art performance in both visual fidelity and user preference.
arXiv Detail & Related papers (2025-07-23T17:57:16Z) - LoLA-SpecViT: Local Attention SwiGLU Vision Transformer with LoRA for Hyperspectral Imaging [6.360399841791849]
We propose textbfLoLA-SpecViT(Low-rank adaptation Local Attention Spectral Vision Transformer), a lightweight spectral vision transformer.<n>Our model combines a 3D convolutional spectral front-end with local window-based self-attention, enhancing both spectral feature extraction and spatial consistency.<n>Our framework provides a scalable and generalizable solution for real-world HSI applications in agriculture, environmental monitoring, and remote sensing analytics.
arXiv Detail & Related papers (2025-06-21T16:46:00Z) - FADPNet: Frequency-Aware Dual-Path Network for Face Super-Resolution [70.61549422952193]
Face super-resolution (FSR) under limited computational costs remains an open problem.<n>Existing approaches typically treat all facial pixels equally, resulting in suboptimal allocation of computational resources.<n>We propose FADPNet, a Frequency-Aware Dual-Path Network that decomposes facial features into low- and high-frequency components.
arXiv Detail & Related papers (2025-06-17T02:33:42Z) - Latent Wavelet Diffusion For Ultra-High-Resolution Image Synthesis [56.311477476580926]
We present Latent Wavelet Diffusion (LWD), a lightweight training framework that significantly improves detail and texture fidelity in ultra-high-resolution (2K-4K) image synthesis.<n>LWD introduces a novel, frequency-aware masking strategy derived from wavelet energy maps, which dynamically focuses the training process on detail-rich regions of the latent space.
arXiv Detail & Related papers (2025-05-31T07:28:32Z) - VRS-UIE: Value-Driven Reordering Scanning for Underwater Image Enhancement [104.78586859995333]
State Space Models (SSMs) have emerged as a promising backbone for vision tasks due to their linear complexity and global receptive field.<n>The predominance of large-portion, homogeneous but useless oceanic backgrounds can dilute the feature representation responses of sparse yet valuable targets.<n>We propose a novel Value-Driven Reordering Scanning framework for Underwater Image Enhancement (UIE)<n>Our framework sets a new state-of-the-art, delivering superior enhancement performance (surpassing WMamba by 0.89 dB on average) by effectively suppressing water bias and preserving structural and color fidelity.
arXiv Detail & Related papers (2025-05-02T12:21:44Z) - Hyperspectral Image Super-Resolution via Dual-domain Network Based on
Hybrid Convolution [6.3814314790000415]
This paper proposes a novel HSI super-resolution algorithm, termed dual-domain network based on hybrid convolution (SRDNet)
To capture inter-spectral self-similarity, a self-attention learning mechanism (HSL) is devised in the spatial domain.
To further improve the perceptual quality of HSI, a frequency loss(HFL) is introduced to optimize the model in the frequency domain.
arXiv Detail & Related papers (2023-04-10T13:51:28Z) - Hyperspectral Image Super-resolution via Deep Progressive Zero-centric
Residual Learning [62.52242684874278]
Cross-modality distribution of spatial and spectral information makes the problem challenging.
We propose a novel textitlightweight deep neural network-based framework, namely PZRes-Net.
Our framework learns a high resolution and textitzero-centric residual image, which contains high-frequency spatial details of the scene.
arXiv Detail & Related papers (2020-06-18T06:32:11Z) - Spatial-Spectral Residual Network for Hyperspectral Image
Super-Resolution [82.1739023587565]
We propose a novel spectral-spatial residual network for hyperspectral image super-resolution (SSRNet)
Our method can effectively explore spatial-spectral information by using 3D convolution instead of 2D convolution, which enables the network to better extract potential information.
In each unit, we employ spatial and temporal separable 3D convolution to extract spatial and spectral information, which not only reduces unaffordable memory usage and high computational cost, but also makes the network easier to train.
arXiv Detail & Related papers (2020-01-14T03:34:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.