Revealing Latent Information: A Physics-inspired Self-supervised Pre-training Framework for Noisy and Sparse Events
- URL: http://arxiv.org/abs/2508.05507v1
- Date: Thu, 07 Aug 2025 15:38:36 GMT
- Title: Revealing Latent Information: A Physics-inspired Self-supervised Pre-training Framework for Noisy and Sparse Events
- Authors: Lin Zhu, Ruonan Liu, Xiao Wang, Lizhi Wang, Hua Huang,
- Abstract summary: Event camera records data with high temporal resolution and wide dynamic range.<n>Event data is inherently sparse and noisy, mainly reflecting brightness changes.<n>We propose a self-supervised pre-training framework to fully reveal latent information in event data.
- Score: 25.348660233701708
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Event camera, a novel neuromorphic vision sensor, records data with high temporal resolution and wide dynamic range, offering new possibilities for accurate visual representation in challenging scenarios. However, event data is inherently sparse and noisy, mainly reflecting brightness changes, which complicates effective feature extraction. To address this, we propose a self-supervised pre-training framework to fully reveal latent information in event data, including edge information and texture cues. Our framework consists of three stages: Difference-guided Masked Modeling, inspired by the event physical sampling process, reconstructs temporal intensity difference maps to extract enhanced information from raw event data. Backbone-fixed Feature Transition contrasts event and image features without updating the backbone to preserve representations learned from masked modeling and stabilizing their effect on contrastive learning. Focus-aimed Contrastive Learning updates the entire model to improve semantic discrimination by focusing on high-value regions. Extensive experiments show our framework is robust and consistently outperforms state-of-the-art methods on various downstream tasks, including object recognition, semantic segmentation, and optical flow estimation. The code and dataset are available at https://github.com/BIT-Vision/EventPretrain.
Related papers
- Event-Based Crossing Dataset (EBCD) [0.9961452710097684]
Event-based vision revolutionizes traditional image sensing by capturing intensity variations rather than static frames.<n>Event-Based Crossing dataset is a dataset tailored for pedestrian and vehicle detection in dynamic outdoor environments.<n>This dataset facilitates an extensive assessment of object detection performance under varying conditions of sparsity and noise suppression.
arXiv Detail & Related papers (2025-03-21T19:20:58Z) - Event-based Motion Deblurring via Multi-Temporal Granularity Fusion [5.58706910566768]
Event camera, a bio-inspired sensor offering continuous visual information could enhance the deblurring performance.<n>Existing event-based image deblurring methods usually utilize voxel-based event representations.<n>We introduce point cloud-based event representation into the image deblurring task and propose a Multi-Temporal Granularity Network (MTGNet)<n>It combines the spatially dense but temporally coarse-grained voxel-based event representation and the temporally fine-grained but spatially sparse point cloud-based event.
arXiv Detail & Related papers (2024-12-16T15:20:54Z) - Data Augmentation via Latent Diffusion for Saliency Prediction [67.88936624546076]
Saliency prediction models are constrained by the limited diversity and quantity of labeled data.
We propose a novel data augmentation method for deep saliency prediction that edits natural images while preserving the complexity and variability of real-world scenes.
arXiv Detail & Related papers (2024-09-11T14:36:24Z) - Evaluating Image-Based Face and Eye Tracking with Event Cameras [9.677797822200965]
Event Cameras, also known as Neuromorphic sensors, capture changes in local light intensity at the pixel level, producing asynchronously generated data termed events''
This data format mitigates common issues observed in conventional cameras, like under-sampling when capturing fast-moving objects.
We evaluate the viability of integrating conventional algorithms with event-based data, transformed into a frame format.
arXiv Detail & Related papers (2024-08-19T20:27:08Z) - A Simple Background Augmentation Method for Object Detection with Diffusion Model [53.32935683257045]
In computer vision, it is well-known that a lack of data diversity will impair model performance.
We propose a simple yet effective data augmentation approach by leveraging advancements in generative models.
Background augmentation, in particular, significantly improves the models' robustness and generalization capabilities.
arXiv Detail & Related papers (2024-08-01T07:40:00Z) - EventZoom: A Progressive Approach to Event-Based Data Augmentation for Enhanced Neuromorphic Vision [9.447299017563841]
Dynamic Vision Sensors (DVS) capture event data with high temporal resolution and low power consumption.<n>Event data augmentation serve as an essential method for overcoming the limitation of scale and diversity in event datasets.
arXiv Detail & Related papers (2024-05-29T08:39:31Z) - An Event-Oriented Diffusion-Refinement Method for Sparse Events
Completion [36.64856578682197]
Event cameras or dynamic vision sensors (DVS) record asynchronous response to brightness changes instead of conventional intensity frames.
We propose an inventive event completion sequence approach conforming to unique characteristics of event data in both the processing stage and the output form.
Specifically, we treat event streams as 3D event clouds in thetemporal domain, develop a diffusion-based generative model to generate dense clouds in a coarse-to-fine manner, and recover exact timestamps to maintain the temporal resolution of raw data successfully.
arXiv Detail & Related papers (2024-01-06T08:09:54Z) - Generalizing Event-Based Motion Deblurring in Real-World Scenarios [62.995994797897424]
Event-based motion deblurring has shown promising results by exploiting low-latency events.
We propose a scale-aware network that allows flexible input spatial scales and enables learning from different temporal scales of motion blur.
A two-stage self-supervised learning scheme is then developed to fit real-world data distribution.
arXiv Detail & Related papers (2023-08-11T04:27:29Z) - Information-Theoretic Odometry Learning [83.36195426897768]
We propose a unified information theoretic framework for learning-motivated methods aimed at odometry estimation.
The proposed framework provides an elegant tool for performance evaluation and understanding in information-theoretic language.
arXiv Detail & Related papers (2022-03-11T02:37:35Z) - Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets.
This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets.
We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z) - Learning Monocular Dense Depth from Events [53.078665310545745]
Event cameras produce brightness changes in the form of a stream of asynchronous events instead of intensity frames.
Recent learning-based approaches have been applied to event-based data, such as monocular depth prediction.
We propose a recurrent architecture to solve this task and show significant improvement over standard feed-forward methods.
arXiv Detail & Related papers (2020-10-16T12:36:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.