Related papers: OmniEvent: Unified Event Representation Learning

OmniEvent: Unified Event Representation Learning

URL: http://arxiv.org/abs/2508.01842v1
Date: Sun, 03 Aug 2025 16:56:36 GMT
Title: OmniEvent: Unified Event Representation Learning
Authors: Weiqi Yan, Chenlu Lin, Youbiao Wang, Zhipeng Cai, Xiuhong Lin, Yangyang Shi, Weiquan Liu, Yu Zang,
Abstract summary: Event networks heavily rely on task-specific designs due to unstructured data distribution and spatial-temporal (S-T) inhomogeneity.<n>We propose OmniEvent, the first unified event representation learning framework that achieves SOTA performance across diverse tasks.
Score: 20.211879134897618
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Event cameras have gained increasing popularity in computer vision due to their ultra-high dynamic range and temporal resolution. However, event networks heavily rely on task-specific designs due to the unstructured data distribution and spatial-temporal (S-T) inhomogeneity, making it hard to reuse existing architectures for new tasks. We propose OmniEvent, the first unified event representation learning framework that achieves SOTA performance across diverse tasks, fully removing the need of task-specific designs. Unlike previous methods that treat event data as 3D point clouds with manually tuned S-T scaling weights, OmniEvent proposes a decouple-enhance-fuse paradigm, where the local feature aggregation and enhancement is done independently on the spatial and temporal domains to avoid inhomogeneity issues. Space-filling curves are applied to enable large receptive fields while improving memory and compute efficiency. The features from individual domains are then fused by attention to learn S-T interactions. The output of OmniEvent is a grid-shaped tensor, which enables standard vision models to process event data without architecture change. With a unified framework and similar hyper-parameters, OmniEvent out-performs (tasks-specific) SOTA by up to 68.2% across 3 representative tasks and 10 datasets (Fig.1). Code will be ready in https://github.com/Wickyan/OmniEvent .

Related papers

Path-adaptive Spatio-Temporal State Space Model for Event-based Recognition with Arbitrary Duration [9.547947845734992]
Event cameras are bio-inspired sensors that capture the intensity changes asynchronously and output event streams. We present a novel framework, dubbed PAST-Act, exhibiting superior capacity in recognizing events with arbitrary duration. We also build a minute-level event-based recognition dataset, named ArDVS100, with arbitrary duration for the benefit of the community.
arXiv Detail & Related papers (2024-09-25T14:08:37Z)
Retain, Blend, and Exchange: A Quality-aware Spatial-Stereo Fusion Approach for Event Stream Recognition [57.74076383449153]
We propose a novel dual-stream framework for event stream-based pattern recognition via differentiated fusion, termed EFV++. It models two common event representations simultaneously, i.e., event images and event voxels. We achieve new state-of-the-art performance on the Bullying10k dataset, i.e., $90.51%$, which exceeds the second place by $+2.21%$.
arXiv Detail & Related papers (2024-06-27T02:32:46Z)
Sports-Traj: A Unified Trajectory Generation Model for Multi-Agent Movement in Sports [53.637837706712794]
We propose a Unified Trajectory Generation model, UniTraj, that processes arbitrary trajectories as masked inputs.<n>Specifically, we introduce a Ghost Spatial Masking (GSM) module, embedded within a Transformer encoder, for spatial feature extraction.<n>We benchmark three practical sports datasets, Basketball-U, Football-U, and Soccer-U, for evaluation.
arXiv Detail & Related papers (2024-05-27T22:15:23Z)
Scalable Event-by-event Processing of Neuromorphic Sensory Signals With Deep State-Space Models [2.551844666707809]
Event-based sensors are well suited for real-time processing. Current methods either collapse events into frames or cannot scale up when processing the event data directly event-by-event.
arXiv Detail & Related papers (2024-04-29T08:50:27Z)
Improving Event Definition Following For Zero-Shot Event Detection [66.27883872707523]
Existing approaches on zero-shot event detection usually train models on datasets annotated with known event types. We aim to improve zero-shot event detection by training models to better follow event definitions.
arXiv Detail & Related papers (2024-03-05T01:46:50Z)
GET: Group Event Transformer for Event-Based Vision [82.312736707534]
Event cameras are a type of novel neuromorphic sen-sor that has been gaining increasing attention. We propose a novel Group-based vision Transformer backbone for Event-based vision, called Group Event Transformer (GET) GET de-couples temporal-polarity information from spatial infor-mation throughout the feature extraction process.
arXiv Detail & Related papers (2023-10-04T08:02:33Z)
OmniEvent: A Comprehensive, Fair, and Easy-to-Use Toolkit for Event Understanding [53.23073872040206]
Event understanding aims at understanding the content and relationship of events within texts. To facilitate related research and application, we present an event understanding toolkit OmniEvent.
arXiv Detail & Related papers (2023-09-25T16:15:09Z)
EventBind: Learning a Unified Representation to Bind Them All for Event-based Open-world Understanding [7.797154022794006]
EventBind is a novel framework that unleashes the potential of vision-language models (VLMs) for event-based recognition. We first introduce a novel event encoder that subtly models the temporal information from events. We then design a text encoder that generates content prompts and utilizes hybrid text prompts to enhance EventBind's generalization ability.
arXiv Detail & Related papers (2023-08-06T15:05:42Z)
Dual Memory Aggregation Network for Event-Based Object Detection with Learnable Representation [79.02808071245634]
Event-based cameras are bio-inspired sensors that capture brightness change of every pixel in an asynchronous manner. Event streams are divided into grids in the x-y-t coordinates for both positive and negative polarity, producing a set of pillars as 3D tensor representation. Long memory is encoded in the hidden state of adaptive convLSTMs while short memory is modeled by computing spatial-temporal correlation between event pillars.
arXiv Detail & Related papers (2023-03-17T12:12:41Z)
Event Voxel Set Transformer for Spatiotemporal Representation Learning on Event Streams [19.957857885844838]
Event cameras are neuromorphic vision sensors that record a scene as sparse and asynchronous event streams. We propose an attentionaware model named Event Voxel Set Transformer (EVSTr) for efficient representation learning on event streams. Experiments show that EVSTr achieves state-of-the-art performance while maintaining low model complexity.
arXiv Detail & Related papers (2023-03-07T12:48:02Z)
Continuous-time convolutions model of event sequences [46.3471121117337]
Event sequences are non-uniform and sparse, making traditional models unsuitable. We propose COTIC, a method based on an efficient convolution neural network designed to handle the non-uniform occurrence of events over time. COTIC outperforms existing models in predicting the next event time and type, achieving an average rank of 1.5 compared to 3.714 for the nearest competitor.
arXiv Detail & Related papers (2023-02-13T10:34:51Z)
Masked Event Modeling: Self-Supervised Pretraining for Event Cameras [41.263606382601886]
Masked Event Modeling (MEM) is a self-supervised framework for events. MEM pretrains a neural network on unlabeled events, which can originate from any event camera recording. Our method reaches state-of-the-art classification accuracy across three datasets.
arXiv Detail & Related papers (2022-12-20T15:49:56Z)
Event Transformer [43.193463048148374]
Event camera's low power consumption and ability to capture microsecond brightness make it attractive for various computer vision tasks. Existing event representation methods typically convert events into frames, voxel grids, or spikes for deep neural networks (DNNs) This work introduces a novel token-based event representation, where each event is considered a fundamental processing unit termed an event-token.
arXiv Detail & Related papers (2022-04-11T15:05:06Z)
Bridging the Gap between Events and Frames through Unsupervised Domain Adaptation [57.22705137545853]
We propose a task transfer method that allows models to be trained directly with labeled images and unlabeled event data. We leverage the generative event model to split event features into content and motion features. Our approach unlocks the vast amount of existing image datasets for the training of event-based neural networks.
arXiv Detail & Related papers (2021-09-06T17:31:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.