Neuromorphic Synergy for Video Binarization
- URL: http://arxiv.org/abs/2402.12644v1
- Date: Tue, 20 Feb 2024 01:43:51 GMT
- Title: Neuromorphic Synergy for Video Binarization
- Authors: Shijie Lin, Xiang Zhang, Lei Yang, Lei Yu, Bin Zhou, Xiaowei Luo,
Wenping Wang, and Jia Pan
- Abstract summary: Bimodal objects serve as a visual form to embed information that can be easily recognized by vision systems.
Neuromorphic cameras offer new capabilities for alleviating motion blur, but it is non-trivial to first de-blur and then binarize the images in a real-time manner.
We propose an event-based binary reconstruction method that leverages the prior knowledge of the bimodal target's properties to perform inference independently in both event space and image space.
We also develop an efficient integration method to propagate this binary image to high frame rate binary video.
- Score: 54.195375576583864
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Bimodal objects, such as the checkerboard pattern used in camera calibration,
markers for object tracking, and text on road signs, to name a few, are
prevalent in our daily lives and serve as a visual form to embed information
that can be easily recognized by vision systems. While binarization from
intensity images is crucial for extracting the embedded information in the
bimodal objects, few previous works consider the task of binarization of blurry
images due to the relative motion between the vision sensor and the
environment. The blurry images can result in a loss in the binarization quality
and thus degrade the downstream applications where the vision system is in
motion. Recently, neuromorphic cameras offer new capabilities for alleviating
motion blur, but it is non-trivial to first deblur and then binarize the images
in a real-time manner. In this work, we propose an event-based binary
reconstruction method that leverages the prior knowledge of the bimodal
target's properties to perform inference independently in both event space and
image space and merge the results from both domains to generate a sharp binary
image. We also develop an efficient integration method to propagate this binary
image to high frame rate binary video. Finally, we develop a novel method to
naturally fuse events and images for unsupervised threshold identification. The
proposed method is evaluated in publicly available and our collected data
sequence, and shows the proposed method can outperform the SOTA methods to
generate high frame rate binary video in real-time on CPU-only devices.
Related papers
- SITAR: Semi-supervised Image Transformer for Action Recognition [20.609596080624662]
This paper addresses video action recognition in a semi-supervised setting by leveraging only a handful of labeled videos.
We capitalize on the vast pool of unlabeled samples and employ contrastive learning on the encoded super images.
Our method demonstrates superior performance compared to existing state-of-the-art approaches for semi-supervised action recognition.
arXiv Detail & Related papers (2024-09-04T17:49:54Z) - CMTA: Cross-Modal Temporal Alignment for Event-guided Video Deblurring [44.30048301161034]
Video deblurring aims to enhance the quality of restored results in motion-red videos by gathering information from adjacent video frames.
We propose two modules: 1) Intra-frame feature enhancement operates within the exposure time of a single blurred frame, and 2) Inter-frame temporal feature alignment gathers valuable long-range temporal information to target frames.
We demonstrate that our proposed methods outperform state-of-the-art frame-based and event-based motion deblurring methods through extensive experiments conducted on both synthetic and real-world deblurring datasets.
arXiv Detail & Related papers (2024-08-27T10:09:17Z) - E2HQV: High-Quality Video Generation from Event Camera via
Theory-Inspired Model-Aided Deep Learning [53.63364311738552]
Bio-inspired event cameras or dynamic vision sensors are capable of capturing per-pixel brightness changes (called event-streams) in high temporal resolution and high dynamic range.
It calls for events-to-video (E2V) solutions which take event-streams as input and generate high quality video frames for intuitive visualization.
We propose textbfE2HQV, a novel E2V paradigm designed to produce high-quality video frames from events.
arXiv Detail & Related papers (2024-01-16T05:10:50Z) - Combined CNN Transformer Encoder for Enhanced Fine-grained Human Action
Recognition [11.116921653535226]
We investigate two frameworks that combine CNN vision backbone and Transformer to enhance fine-grained action recognition.
Our experimental results show that both our Transformer encoder frameworks effectively learn latent temporal semantics and cross-modality association.
We achieve new state-of-the-art performance on the FineGym benchmark dataset for both proposed architectures.
arXiv Detail & Related papers (2022-08-03T08:01:55Z) - Recent Trends in 2D Object Detection and Applications in Video Event
Recognition [0.76146285961466]
We discuss the pioneering works in object detection, followed by the recent breakthroughs that employ deep learning.
We highlight recent datasets for 2D object detection both in images and videos, and present a comparative performance summary of various state-of-the-art object detection techniques.
arXiv Detail & Related papers (2022-02-07T14:15:11Z) - Dual-view Snapshot Compressive Imaging via Optical Flow Aided Recurrent
Neural Network [14.796204921975733]
Dual-view snapshot compressive imaging (SCI) aims to capture videos from two field-of-views (FoVs) in a single snapshot.
It is challenging for existing model-based decoding algorithms to reconstruct each individual scene.
We propose an optical flow-aided recurrent neural network for dual video SCI systems, which provides high-quality decoding in seconds.
arXiv Detail & Related papers (2021-09-11T14:24:44Z) - Temporal Distinct Representation Learning for Action Recognition [139.93983070642412]
Two-Dimensional Convolutional Neural Network (2D CNN) is used to characterize videos.
Different frames of a video share the same 2D CNN kernels, which may result in repeated and redundant information utilization.
We propose a sequential channel filtering mechanism to excite the discriminative channels of features from different frames step by step, and thus avoid repeated information extraction.
Our method is evaluated on benchmark temporal reasoning datasets Something-Something V1 and V2, and it achieves visible improvements over the best competitor by 2.4% and 1.3%, respectively.
arXiv Detail & Related papers (2020-07-15T11:30:40Z) - Two-shot Spatially-varying BRDF and Shape Estimation [89.29020624201708]
We propose a novel deep learning architecture with a stage-wise estimation of shape and SVBRDF.
We create a large-scale synthetic training dataset with domain-randomized geometry and realistic materials.
Experiments on both synthetic and real-world datasets show that our network trained on a synthetic dataset can generalize well to real-world images.
arXiv Detail & Related papers (2020-04-01T12:56:13Z) - Self-Supervised Linear Motion Deblurring [112.75317069916579]
Deep convolutional neural networks are state-of-the-art for image deblurring.
We present a differentiable reblur model for self-supervised motion deblurring.
Our experiments demonstrate that self-supervised single image deblurring is really feasible.
arXiv Detail & Related papers (2020-02-10T20:15:21Z) - Towards Coding for Human and Machine Vision: A Scalable Image Coding
Approach [104.02201472370801]
We come up with a novel image coding framework by leveraging both the compressive and the generative models.
By introducing advanced generative models, we train a flexible network to reconstruct images from compact feature representations and the reference pixels.
Experimental results demonstrate the superiority of our framework in both human visual quality and facial landmark detection.
arXiv Detail & Related papers (2020-01-09T10:37:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.