EvRepSL: Event-Stream Representation via Self-Supervised Learning for Event-Based Vision
- URL: http://arxiv.org/abs/2412.07080v1
- Date: Tue, 10 Dec 2024 00:42:54 GMT
- Title: EvRepSL: Event-Stream Representation via Self-Supervised Learning for Event-Based Vision
- Authors: Qiang Qu, Xiaoming Chen, Yuk Ying Chung, Yiran Shen,
- Abstract summary: Event-stream representation is the first step for many computer vision tasks using event cameras.
We introduce a data-driven approach aiming at enhancing the quality of event-stream representations.
- Score: 12.542303392870329
- License:
- Abstract: Event-stream representation is the first step for many computer vision tasks using event cameras. It converts the asynchronous event-streams into a formatted structure so that conventional machine learning models can be applied easily. However, most of the state-of-the-art event-stream representations are manually designed and the quality of these representations cannot be guaranteed due to the noisy nature of event-streams. In this paper, we introduce a data-driven approach aiming at enhancing the quality of event-stream representations. Our approach commences with the introduction of a new event-stream representation based on spatial-temporal statistics, denoted as EvRep. Subsequently, we theoretically derive the intrinsic relationship between asynchronous event-streams and synchronous video frames. Building upon this theoretical relationship, we train a representation generator, RepGen, in a self-supervised learning manner accepting EvRep as input. Finally, the event-streams are converted to high-quality representations, termed as EvRepSL, by going through the learned RepGen (without the need of fine-tuning or retraining). Our methodology is rigorously validated through extensive evaluations on a variety of mainstream event-based classification and optical flow datasets (captured with various types of event cameras). The experimental results highlight not only our approach's superior performance over existing event-stream representations but also its versatility, being agnostic to different event cameras and tasks.
Related papers
- Grounding Partially-Defined Events in Multimodal Data [61.0063273919745]
We introduce a multimodal formulation for partially-defined events and cast the extraction of these events as a three-stage span retrieval task.
We propose a benchmark for this task, MultiVENT-G, that consists of 14.5 hours of densely annotated current event videos and 1,168 text documents, containing 22.8K labeled event-centric entities.
Results illustrate the challenges that abstract event understanding poses and demonstrates promise in event-centric video-language systems.
arXiv Detail & Related papers (2024-10-07T17:59:48Z) - Retain, Blend, and Exchange: A Quality-aware Spatial-Stereo Fusion Approach for Event Stream Recognition [57.74076383449153]
We propose a novel dual-stream framework for event stream-based pattern recognition via differentiated fusion, termed EFV++.
It models two common event representations simultaneously, i.e., event images and event voxels.
We achieve new state-of-the-art performance on the Bullying10k dataset, i.e., $90.51%$, which exceeds the second place by $+2.21%$.
arXiv Detail & Related papers (2024-06-27T02:32:46Z) - E2HQV: High-Quality Video Generation from Event Camera via
Theory-Inspired Model-Aided Deep Learning [53.63364311738552]
Bio-inspired event cameras or dynamic vision sensors are capable of capturing per-pixel brightness changes (called event-streams) in high temporal resolution and high dynamic range.
It calls for events-to-video (E2V) solutions which take event-streams as input and generate high quality video frames for intuitive visualization.
We propose textbfE2HQV, a novel E2V paradigm designed to produce high-quality video frames from events.
arXiv Detail & Related papers (2024-01-16T05:10:50Z) - GET: Group Event Transformer for Event-Based Vision [82.312736707534]
Event cameras are a type of novel neuromorphic sen-sor that has been gaining increasing attention.
We propose a novel Group-based vision Transformer backbone for Event-based vision, called Group Event Transformer (GET)
GET de-couples temporal-polarity information from spatial infor-mation throughout the feature extraction process.
arXiv Detail & Related papers (2023-10-04T08:02:33Z) - EvDNeRF: Reconstructing Event Data with Dynamic Neural Radiance Fields [80.94515892378053]
EvDNeRF is a pipeline for generating event data and training an event-based dynamic NeRF.
NeRFs offer geometric-based learnable rendering, but prior work with events has only considered reconstruction of static scenes.
We show that by training on varied batch sizes of events, we can improve test-time predictions of events at fine time resolutions.
arXiv Detail & Related papers (2023-10-03T21:08:41Z) - Learning Bottleneck Transformer for Event Image-Voxel Feature Fusion
based Classification [6.550582412924754]
This paper proposes a novel dual-stream framework for event representation, extraction, and fusion.
Experiments demonstrate that our proposed framework achieves state-of-the-art performance on two widely used event-based classification datasets.
arXiv Detail & Related papers (2023-08-23T06:07:56Z) - Event Voxel Set Transformer for Spatiotemporal Representation Learning on Event Streams [19.957857885844838]
Event cameras are neuromorphic vision sensors that record a scene as sparse and asynchronous event streams.
We propose an attentionaware model named Event Voxel Set Transformer (EVSTr) for efficient representation learning on event streams.
Experiments show that EVSTr achieves state-of-the-art performance while maintaining low model complexity.
arXiv Detail & Related papers (2023-03-07T12:48:02Z) - Event Transformer [43.193463048148374]
Event camera's low power consumption and ability to capture microsecond brightness make it attractive for various computer vision tasks.
Existing event representation methods typically convert events into frames, voxel grids, or spikes for deep neural networks (DNNs)
This work introduces a novel token-based event representation, where each event is considered a fundamental processing unit termed an event-token.
arXiv Detail & Related papers (2022-04-11T15:05:06Z) - Bina-Rep Event Frames: a Simple and Effective Representation for
Event-based cameras [1.6114012813668934]
"Bina-Rep" is a simple representation method that converts asynchronous streams of events from event cameras to a sequence of sparse and expressive event frames.
Our method is able to obtain sparser and more expressive event frames thanks to the retained information about event orders in the original stream.
arXiv Detail & Related papers (2022-02-28T10:23:09Z) - Superevents: Towards Native Semantic Segmentation for Event-based
Cameras [13.099264910430986]
Most successful computer vision models transform low-level features, such as Gabor filter responses, into richer representations of intermediate or mid-level complexity for downstream visual tasks.
We present a novel method that employs lifetime augmentation for obtaining an event stream representation that is fed to a fully convolutional network to extract superevents.
arXiv Detail & Related papers (2021-05-13T05:49:41Z) - Team RUC_AIM3 Technical Report at Activitynet 2020 Task 2: Exploring
Sequential Events Detection for Dense Video Captioning [63.91369308085091]
We propose a novel and simple model for event sequence generation and explore temporal relationships of the event sequence in the video.
The proposed model omits inefficient two-stage proposal generation and directly generates event boundaries conditioned on bi-directional temporal dependency in one pass.
The overall system achieves state-of-the-art performance on the dense-captioning events in video task with 9.894 METEOR score on the challenge testing set.
arXiv Detail & Related papers (2020-06-14T13:21:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.