ClearSight: Human Vision-Inspired Solutions for Event-Based Motion Deblurring
- URL: http://arxiv.org/abs/2501.15808v1
- Date: Mon, 27 Jan 2025 06:28:45 GMT
- Title: ClearSight: Human Vision-Inspired Solutions for Event-Based Motion Deblurring
- Authors: Xiaopeng Lin, Yulong Huang, Hongwei Ren, Zunchang Liu, Yue Zhou, Haotian Fu, Bojun Cheng,
- Abstract summary: This study introduces a bioinspired dual-drive hybrid network (BDHNet)
Inspired by the visual attention mechanism in the human visual system, this study introduces a bioinspired dual-drive hybrid network (BDHNet)
- Score: 5.827705323847176
- License:
- Abstract: Motion deblurring addresses the challenge of image blur caused by camera or scene movement. Event cameras provide motion information that is encoded in the asynchronous event streams. To efficiently leverage the temporal information of event streams, we employ Spiking Neural Networks (SNNs) for motion feature extraction and Artificial Neural Networks (ANNs) for color information processing. Due to the non-uniform distribution and inherent redundancy of event data, existing cross-modal feature fusion methods exhibit certain limitations. Inspired by the visual attention mechanism in the human visual system, this study introduces a bioinspired dual-drive hybrid network (BDHNet). Specifically, the Neuron Configurator Module (NCM) is designed to dynamically adjusts neuron configurations based on cross-modal features, thereby focusing the spikes in blurry regions and adapting to varying blurry scenarios dynamically. Additionally, the Region of Blurry Attention Module (RBAM) is introduced to generate a blurry mask in an unsupervised manner, effectively extracting motion clues from the event features and guiding more accurate cross-modal feature fusion. Extensive subjective and objective evaluations demonstrate that our method outperforms current state-of-the-art methods on both synthetic and real-world datasets.
Related papers
- Gear-NeRF: Free-Viewpoint Rendering and Tracking with Motion-aware Spatio-Temporal Sampling [70.34875558830241]
We present a way for learning a-temporal (4D) embedding, based on semantic semantic gears to allow for stratified modeling of dynamic regions of rendering the scene.
At the same time, almost for free, our tracking approach enables free-viewpoint of interest - a functionality not yet achieved by existing NeRF-based methods.
arXiv Detail & Related papers (2024-06-06T03:37:39Z) - Non-Uniform Exposure Imaging via Neuromorphic Shutter Control [9.519068512865463]
We propose a novel Neuromorphic Shutter Control (NSC) system to avoid motion blurs and alleviate instant noises.
To stabilize the inconsistent Signal-to-Noise Ratio (SNR), we propose an event-based image denoising network within a self-supervised learning paradigm.
arXiv Detail & Related papers (2024-04-22T08:28:41Z) - Neuromorphic Vision-based Motion Segmentation with Graph Transformer Neural Network [4.386534439007928]
We propose a novel event-based motion segmentation algorithm using a Graph Transformer Neural Network, dubbed GTNN.
Our proposed algorithm processes event streams as 3D graphs by a series nonlinear transformations to unveil local and global correlations between events.
We show that GTNN outperforms state-of-the-art methods in the presence of dynamic background variations, motion patterns, and multiple dynamic objects with varying sizes and velocities.
arXiv Detail & Related papers (2024-04-16T22:44:29Z) - TrackDiffusion: Tracklet-Conditioned Video Generation via Diffusion Models [75.20168902300166]
We propose TrackDiffusion, a novel video generation framework affording fine-grained trajectory-conditioned motion control.
A pivotal component of TrackDiffusion is the instance enhancer, which explicitly ensures inter-frame consistency of multiple objects.
generated video sequences by our TrackDiffusion can be used as training data for visual perception models.
arXiv Detail & Related papers (2023-12-01T15:24:38Z) - Amortized Network Intervention to Steer the Excitatory Point Processes [8.15558505134853]
Excitatory point processes (i.e., event flows) occurring over dynamic graphs provide a fine-grained model to capture how discrete events may spread over time and space.
How to effectively steer the event flows by modifying the dynamic graph structures presents an interesting problem, motivated by curbing the spread of infectious diseases.
We design an Amortized Network Interventions framework, allowing for the pooling of optimal policies from history and other contexts.
arXiv Detail & Related papers (2023-10-06T11:17:28Z) - Alignment-free HDR Deghosting with Semantics Consistent Transformer [76.91669741684173]
High dynamic range imaging aims to retrieve information from multiple low-dynamic range inputs to generate realistic output.
Existing methods often focus on the spatial misalignment across input frames caused by the foreground and/or camera motion.
We propose a novel alignment-free network with a Semantics Consistent Transformer (SCTNet) with both spatial and channel attention modules.
arXiv Detail & Related papers (2023-05-29T15:03:23Z) - Learning What and Where -- Unsupervised Disentangling Location and
Identity Tracking [0.44040106718326594]
We introduce an unsupervisedd LOCation and Identity tracking system (Loci)
Inspired by the dorsal-ventral pathways in the brain, Loci tackles the what-and-where binding problem by means of a self-supervised segregation mechanism.
Loci may set the stage for deeper, explanation-oriented video processing.
arXiv Detail & Related papers (2022-05-26T13:30:14Z) - Learning Multi-Granular Spatio-Temporal Graph Network for Skeleton-based
Action Recognition [49.163326827954656]
We propose a novel multi-granular-temporal graph network for skeleton-based action classification.
We develop a dual-head graph network consisting of two inter-leaved branches, which enables us to extract at least two-temporal resolutions.
We conduct extensive experiments on three large-scale datasets.
arXiv Detail & Related papers (2021-08-10T09:25:07Z) - End-to-end Multi-modal Video Temporal Grounding [105.36814858748285]
We propose a multi-modal framework to extract complementary information from videos.
We adopt RGB images for appearance, optical flow for motion, and depth maps for image structure.
We conduct experiments on the Charades-STA and ActivityNet Captions datasets, and show that the proposed method performs favorably against state-of-the-art approaches.
arXiv Detail & Related papers (2021-07-12T17:58:10Z) - Event-based Motion Segmentation with Spatio-Temporal Graph Cuts [51.17064599766138]
We have developed a method to identify independently objects acquired with an event-based camera.
The method performs on par or better than the state of the art without having to predetermine the number of expected moving objects.
arXiv Detail & Related papers (2020-12-16T04:06:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.