How Asynchronous Events Encode Video
- URL: http://arxiv.org/abs/2206.04341v1
- Date: Thu, 9 Jun 2022 08:36:21 GMT
- Title: How Asynchronous Events Encode Video
- Authors: Karen Adam, Adam Scholefield, Martin Vetterli
- Abstract summary: Event-based cameras have sensors that emit events when their inputs change, thus encoding information in the timing of events.
This creates new challenges in establishing reconstruction guarantees and algorithms, but also provides advantages over frame-based video.
We consider the case of time encoding bandlimited video and demonstrate a dependence between spatial sensor density and overall spatial and temporal resolution.
- Score: 18.666472443354092
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As event-based sensing gains in popularity, theoretical understanding is
needed to harness this technology's potential. Instead of recording video by
capturing frames, event-based cameras have sensors that emit events when their
inputs change, thus encoding information in the timing of events. This creates
new challenges in establishing reconstruction guarantees and algorithms, but
also provides advantages over frame-based video. We use time encoding machines
to model event-based sensors: TEMs also encode their inputs by emitting events
characterized by their timing and reconstruction from time encodings is well
understood. We consider the case of time encoding bandlimited video and
demonstrate a dependence between spatial sensor density and overall spatial and
temporal resolution. Such a dependence does not occur in frame-based video,
where temporal resolution depends solely on the frame rate of the video and
spatial resolution depends solely on the pixel grid. However, this dependence
arises naturally in event-based video and allows oversampling in space to
provide better time resolution. As such, event-based vision encourages using
more sensors that emit fewer events over time.
Related papers
- Rethinking Video with a Universal Event-Based Representation [0.0]
I introduce Address, Decimation, DeltaER, a novel intermediate video representation and system framework.
I demonstrate that ADDeltaER achieves state-of-the-art application speed and compression performance for scenes with high temporal redundancy.
I discuss the implications for event-based video on large-scale video surveillance and resource-constrained sensing.
arXiv Detail & Related papers (2024-08-12T16:00:17Z) - HR-INR: Continuous Space-Time Video Super-Resolution via Event Camera [22.208120663778043]
Continuous space-time super-resolution (C-STVSR) aims to simultaneously enhance resolution and frame rate at an arbitrary scale.
We propose a novel C-STVSR framework, called HR-INR, which captures both holistic dependencies and regional motions based on implicit neural representation (INR)
We then propose a novel INR-based decoder withtemporal embeddings to capture long-term dependencies with a larger temporal perception field.
arXiv Detail & Related papers (2024-05-22T06:51:32Z) - VidToMe: Video Token Merging for Zero-Shot Video Editing [100.79999871424931]
We propose a novel approach to enhance temporal consistency in generated videos by merging self-attention tokens across frames.
Our method improves temporal coherence and reduces memory consumption in self-attention computations.
arXiv Detail & Related papers (2023-12-17T09:05:56Z) - V2CE: Video to Continuous Events Simulator [1.1009908861287052]
We present a novel method for video-to-events stream conversion from multiple perspectives, considering the specific characteristics of Dynamic Vision Sensor (DVS)
A series of carefully designed timestamp losses helps enhance the quality of generated event voxels significantly.
We also propose a novel local dynamic-aware inference strategy to accurately recover event timestamps from event voxels in a continuous fashion.
arXiv Detail & Related papers (2023-09-16T06:06:53Z) - VideoComposer: Compositional Video Synthesis with Motion Controllability [52.4714732331632]
VideoComposer allows users to flexibly compose a video with textual conditions, spatial conditions, and more importantly temporal conditions.
We introduce the motion vector from compressed videos as an explicit control signal to provide guidance regarding temporal dynamics.
In addition, we develop a Spatio-Temporal Condition encoder (STC-encoder) that serves as a unified interface to effectively incorporate the spatial and temporal relations of sequential inputs.
arXiv Detail & Related papers (2023-06-03T06:29:02Z) - Continuous Space-Time Video Super-Resolution Utilizing Long-Range
Temporal Information [48.20843501171717]
We propose a continuous ST-VSR (CSTVSR) method that can convert the given video to any frame rate and spatial resolution.
We show that the proposed algorithm has good flexibility and achieves better performance on various datasets.
arXiv Detail & Related papers (2023-02-26T08:02:39Z) - Exploring Long- and Short-Range Temporal Information for Learned Video
Compression [54.91301930491466]
We focus on exploiting the unique characteristics of video content and exploring temporal information to enhance compression performance.
For long-range temporal information exploitation, we propose temporal prior that can update continuously within the group of pictures (GOP) during inference.
In that case temporal prior contains valuable temporal information of all decoded images within the current GOP.
In detail, we design a hierarchical structure to achieve multi-scale compensation.
arXiv Detail & Related papers (2022-08-07T15:57:18Z) - VideoINR: Learning Video Implicit Neural Representation for Continuous
Space-Time Super-Resolution [75.79379734567604]
We show that Video Implicit Neural Representation (VideoINR) can be decoded to videos of arbitrary spatial resolution and frame rate.
We show that VideoINR achieves competitive performances with state-of-the-art STVSR methods on common up-sampling scales.
arXiv Detail & Related papers (2022-06-09T17:45:49Z) - Combining Events and Frames using Recurrent Asynchronous Multimodal
Networks for Monocular Depth Prediction [51.072733683919246]
We introduce Recurrent Asynchronous Multimodal (RAM) networks to handle asynchronous and irregular data from multiple sensors.
Inspired by traditional RNNs, RAM networks maintain a hidden state that is updated asynchronously and can be queried at any time to generate a prediction.
We show an improvement over state-of-the-art methods by up to 30% in terms of mean depth absolute error.
arXiv Detail & Related papers (2021-02-18T13:24:35Z) - End-to-End Learning for Video Frame Compression with Self-Attention [25.23586503813838]
We propose an end-to-end learned system for compressing video frames.
Our system learns deep embeddings of frames and encodes their difference in latent space.
In our experiments, we show that the proposed system achieves high compression rates and high objective visual quality.
arXiv Detail & Related papers (2020-04-20T12:11:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.