Spatio-Temporal Recurrent Networks for Event-Based Optical Flow
Estimation
- URL: http://arxiv.org/abs/2109.04871v1
- Date: Fri, 10 Sep 2021 13:37:37 GMT
- Title: Spatio-Temporal Recurrent Networks for Event-Based Optical Flow
Estimation
- Authors: Ziluo Ding, Rui Zhao, Jiyuan Zhang, Tianxiao Gao, Ruiqin Xiong,
Zhaofei Yu, Tiejun Huang
- Abstract summary: We introduce a novel recurrent encoding-decoding neural network architecture for event-based optical flow estimation.
The network is end-to-end trained with self-supervised learning on the Multi-Vehicle Stereo Event Camera dataset.
We have shown that it outperforms all the existing state-of-the-art methods by a large margin.
- Score: 47.984368369734995
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Event camera has offered promising alternative for visual perception,
especially in high speed and high dynamic range scenes. Recently, many deep
learning methods have shown great success in providing model-free solutions to
many event-based problems, such as optical flow estimation. However, existing
deep learning methods did not address the importance of temporal information
well from the perspective of architecture design and cannot effectively extract
spatio-temporal features. Another line of research that utilizes Spiking Neural
Network suffers from training issues for deeper architecture. To address these
points, a novel input representation is proposed that captures the events
temporal distribution for signal enhancement. Moreover, we introduce a
spatio-temporal recurrent encoding-decoding neural network architecture for
event-based optical flow estimation, which utilizes Convolutional Gated
Recurrent Units to extract feature maps from a series of event images. Besides,
our architecture allows some traditional frame-based core modules, such as
correlation layer and iterative residual refine scheme, to be incorporated. The
network is end-to-end trained with self-supervised learning on the
Multi-Vehicle Stereo Event Camera dataset. We have shown that it outperforms
all the existing state-of-the-art methods by a large margin.
Related papers
- TCCT-Net: Two-Stream Network Architecture for Fast and Efficient Engagement Estimation via Behavioral Feature Signals [58.865901821451295]
We present a novel two-stream feature fusion "Tensor-Convolution and Convolution-Transformer Network" (TCCT-Net) architecture.
To better learn the meaningful patterns in the temporal-spatial domain, we design a "CT" stream that integrates a hybrid convolutional-transformer.
In parallel, to efficiently extract rich patterns from the temporal-frequency domain, we introduce a "TC" stream that uses Continuous Wavelet Transform (CWT) to represent information in a 2D tensor form.
arXiv Detail & Related papers (2024-04-15T06:01:48Z) - Faster ISNet for Background Bias Mitigation on Deep Neural Networks [0.4915744683251149]
Bias or spurious correlations in image backgrounds can impact neural networks, causing shortcut learning and hampering generalization to real-world data.
We propose reformulated architectures whose training time becomes independent from this number.
We challenge the proposed architectures using synthetic background bias, and COVID-19 detection in chest X-rays, an application that commonly presents background bias.
arXiv Detail & Related papers (2024-01-16T14:49:26Z) - Neuromorphic Optical Flow and Real-time Implementation with Event
Cameras [47.11134388304464]
We build on the latest developments in event-based vision and spiking neural networks.
We propose a new network architecture that improves the state-of-the-art self-supervised optical flow accuracy.
We demonstrate high speed optical flow prediction with almost two orders of magnitude reduced complexity.
arXiv Detail & Related papers (2023-04-14T14:03:35Z) - Motion-aware Memory Network for Fast Video Salient Object Detection [15.967509480432266]
We design a space-time memory (STM)-based network, which extracts useful temporal information of the current frame from adjacent frames as the temporal branch of VSOD.
In the encoding stage, we generate high-level temporal features by using high-level features from the current and its adjacent frames.
In the decoding stage, we propose an effective fusion strategy for spatial and temporal branches.
The proposed model does not require optical flow or other preprocessing, and can reach a speed of nearly 100 FPS during inference.
arXiv Detail & Related papers (2022-08-01T15:56:19Z) - Unsupervised Monocular Depth Learning with Integrated Intrinsics and
Spatio-Temporal Constraints [61.46323213702369]
This work presents an unsupervised learning framework that is able to predict at-scale depth maps and egomotion.
Our results demonstrate strong performance when compared to the current state-of-the-art on multiple sequences of the KITTI driving dataset.
arXiv Detail & Related papers (2020-11-02T22:26:58Z) - Fast Video Salient Object Detection via Spatiotemporal Knowledge
Distillation [20.196945571479002]
We present a lightweight network tailored for video salient object detection.
Specifically, we combine a saliency guidance embedding structure and spatial knowledge distillation to refine the spatial features.
In the temporal aspect, we propose a temporal knowledge distillation strategy, which allows the network to learn the robust temporal features.
arXiv Detail & Related papers (2020-10-20T04:48:36Z) - Back to Event Basics: Self-Supervised Learning of Image Reconstruction
for Event Cameras via Photometric Constancy [0.0]
Event cameras are novel vision sensors that sample, in an asynchronous fashion, brightness increments with low latency and high temporal resolution.
We propose a novel, lightweight neural network for optical flow estimation that achieves high speed inference with only a minor drop in performance.
Results across multiple datasets show that the performance of the proposed self-supervised approach is in line with the state-of-the-art.
arXiv Detail & Related papers (2020-09-17T13:30:05Z) - NAS-DIP: Learning Deep Image Prior with Neural Architecture Search [65.79109790446257]
Recent work has shown that the structure of deep convolutional neural networks can be used as a structured image prior.
We propose to search for neural architectures that capture stronger image priors.
We search for an improved network by leveraging an existing neural architecture search algorithm.
arXiv Detail & Related papers (2020-08-26T17:59:36Z) - Cascaded Deep Video Deblurring Using Temporal Sharpness Prior [88.98348546566675]
The proposed algorithm mainly consists of optical flow estimation from intermediate latent frames and latent frame restoration steps.
It first develops a deep CNN model to estimate optical flow from intermediate latent frames and then restores the latent frames based on the estimated optical flow.
We show that exploring the domain knowledge of video deblurring is able to make the deep CNN model more compact and efficient.
arXiv Detail & Related papers (2020-04-06T09:13:49Z) - Learning Enriched Features for Real Image Restoration and Enhancement [166.17296369600774]
convolutional neural networks (CNNs) have achieved dramatic improvements over conventional approaches for image restoration task.
We present a novel architecture with the collective goals of maintaining spatially-precise high-resolution representations through the entire network.
Our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
arXiv Detail & Related papers (2020-03-15T11:04:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.