Data-efficient Event Camera Pre-training via Disentangled Masked
Modeling
- URL: http://arxiv.org/abs/2403.00416v1
- Date: Fri, 1 Mar 2024 10:02:25 GMT
- Title: Data-efficient Event Camera Pre-training via Disentangled Masked
Modeling
- Authors: Zhenpeng Huang, Chao Li, Hao Chen, Yongjian Deng, Yifeng Geng, Limin
Wang
- Abstract summary: We present a new data-supervised voxel-based self-supervised learning method for event cameras.
Our method overcomes the limitations of previous methods, which either sacrifice temporal information or directly employ paired image data.
It exhibits excellent generalization performance and demonstrates significant improvements across various tasks with fewer parameters and lower computational costs.
- Score: 20.987277885575963
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we present a new data-efficient voxel-based self-supervised
learning method for event cameras. Our pre-training overcomes the limitations
of previous methods, which either sacrifice temporal information by converting
event sequences into 2D images for utilizing pre-trained image models or
directly employ paired image data for knowledge distillation to enhance the
learning of event streams. In order to make our pre-training data-efficient, we
first design a semantic-uniform masking method to address the learning
imbalance caused by the varying reconstruction difficulties of different
regions in non-uniform data when using random masking. In addition, we ease the
traditional hybrid masked modeling process by explicitly decomposing it into
two branches, namely local spatio-temporal reconstruction and global semantic
reconstruction to encourage the encoder to capture local correlations and
global semantics, respectively. This decomposition allows our selfsupervised
learning method to converge faster with minimal pre-training data. Compared to
previous approaches, our self-supervised learning method does not rely on
paired RGB images, yet enables simultaneous exploration of spatial and temporal
cues in multiple scales. It exhibits excellent generalization performance and
demonstrates significant improvements across various tasks with fewer
parameters and lower computational costs.
Related papers
- A Two-Stage Progressive Pre-training using Multi-Modal Contrastive Masked Autoencoders [5.069884983892437]
We propose a new progressive pre-training method for image understanding tasks which leverages RGB-D datasets.
In the first stage, we pre-train the model using contrastive learning to learn cross-modal representations.
In the second stage, we further pre-train the model using masked autoencoding and denoising/noise prediction.
Our approach is scalable, robust and suitable for pre-training RGB-D datasets.
arXiv Detail & Related papers (2024-08-05T05:33:59Z) - ReNoise: Real Image Inversion Through Iterative Noising [62.96073631599749]
We introduce an inversion method with a high quality-to-operation ratio, enhancing reconstruction accuracy without increasing the number of operations.
We evaluate the performance of our ReNoise technique using various sampling algorithms and models, including recent accelerated diffusion models.
arXiv Detail & Related papers (2024-03-21T17:52:08Z) - Accelerating Multiframe Blind Deconvolution via Deep Learning [0.0]
Ground-based solar image restoration is a computationally expensive procedure.
We propose a new method to accelerate the restoration based on algorithm unrolling.
We show that both methods significantly reduce the restoration time compared to the standard optimization procedure.
arXiv Detail & Related papers (2023-06-21T07:53:00Z) - Towards Adaptable and Interactive Image Captioning with Data
Augmentation and Episodic Memory [8.584932159968002]
We present an IML pipeline for image captioning which allows us to incrementally adapt a pre-trained model to a new data distribution based on user input.
We find that, while data augmentation worsens results, even when relatively small amounts of data are available, episodic memory is an effective strategy to retain knowledge from previously seen clusters.
arXiv Detail & Related papers (2023-06-06T08:38:10Z) - Adaptive Cross Batch Normalization for Metric Learning [75.91093210956116]
Metric learning is a fundamental problem in computer vision.
We show that it is equally important to ensure that the accumulated embeddings are up to date.
In particular, it is necessary to circumvent the representational drift between the accumulated embeddings and the feature embeddings at the current training iteration.
arXiv Detail & Related papers (2023-03-30T03:22:52Z) - Single image calibration using knowledge distillation approaches [1.7205106391379026]
We build upon a CNN architecture to automatically estimate camera parameters.
We adapt four common incremental learning strategies to preserve knowledge when updating the network for new data distributions.
Experiment results were significant and indicated which method was better for the camera calibration estimation.
arXiv Detail & Related papers (2022-12-05T15:59:35Z) - Learning Discriminative Shrinkage Deep Networks for Image Deconvolution [122.79108159874426]
We propose an effective non-blind deconvolution approach by learning discriminative shrinkage functions to implicitly model these terms.
Experimental results show that the proposed method performs favorably against the state-of-the-art ones in terms of efficiency and accuracy.
arXiv Detail & Related papers (2021-11-27T12:12:57Z) - Sparse Signal Models for Data Augmentation in Deep Learning ATR [0.8999056386710496]
We propose a data augmentation approach to incorporate domain knowledge and improve the generalization power of a data-intensive learning algorithm.
We exploit the sparsity of the scattering centers in the spatial domain and the smoothly-varying structure of the scattering coefficients in the azimuthal domain to solve the ill-posed problem of over-parametrized model fitting.
arXiv Detail & Related papers (2020-12-16T21:46:33Z) - Shared Prior Learning of Energy-Based Models for Image Reconstruction [69.72364451042922]
We propose a novel learning-based framework for image reconstruction particularly designed for training without ground truth data.
In the absence of ground truth data, we change the loss functional to a patch-based Wasserstein functional.
In shared prior learning, both aforementioned optimal control problems are optimized simultaneously with shared learned parameters of the regularizer.
arXiv Detail & Related papers (2020-11-12T17:56:05Z) - Unsupervised Feature Learning for Event Data: Direct vs Inverse Problem
Formulation [53.850686395708905]
Event-based cameras record an asynchronous stream of per-pixel brightness changes.
In this paper, we focus on single-layer architectures for representation learning from event data.
We show improvements of up to 9 % in the recognition accuracy compared to the state-of-the-art methods.
arXiv Detail & Related papers (2020-09-23T10:40:03Z) - FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning [64.32306537419498]
We propose a novel learned feature-based refinement and augmentation method that produces a varied set of complex transformations.
These transformations also use information from both within-class and across-class representations that we extract through clustering.
We demonstrate that our method is comparable to current state of art for smaller datasets while being able to scale up to larger datasets.
arXiv Detail & Related papers (2020-07-16T17:55:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.