Fast Neural Scene Flow
- URL: http://arxiv.org/abs/2304.09121v3
- Date: Tue, 29 Aug 2023 12:32:01 GMT
- Title: Fast Neural Scene Flow
- Authors: Xueqian Li, Jianqiao Zheng, Francesco Ferroni, Jhony Kaesemodel
Pontes, Simon Lucey
- Abstract summary: A coordinate neural network estimates scene flow at runtime, without any training.
In this paper, we demonstrate that scene flow is different -- with the dominant computational bottleneck stemming from the loss function itself.
Our fast neural scene flow (FNSF) approach reports for the first time real-time performance comparable to learning methods.
- Score: 36.29234109363439
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural Scene Flow Prior (NSFP) is of significant interest to the vision
community due to its inherent robustness to out-of-distribution (OOD) effects
and its ability to deal with dense lidar points. The approach utilizes a
coordinate neural network to estimate scene flow at runtime, without any
training. However, it is up to 100 times slower than current state-of-the-art
learning methods. In other applications such as image, video, and radiance
function reconstruction innovations in speeding up the runtime performance of
coordinate networks have centered upon architectural changes. In this paper, we
demonstrate that scene flow is different -- with the dominant computational
bottleneck stemming from the loss function itself (i.e., Chamfer distance).
Further, we rediscover the distance transform (DT) as an efficient,
correspondence-free loss function that dramatically speeds up the runtime
optimization. Our fast neural scene flow (FNSF) approach reports for the first
time real-time performance comparable to learning methods, without any training
or OOD bias on two of the largest open autonomous driving (AV) lidar datasets
Waymo Open and Argoverse.
Related papers
- LeRF: Learning Resampling Function for Adaptive and Efficient Image Interpolation [64.34935748707673]
Recent deep neural networks (DNNs) have made impressive progress in performance by introducing learned data priors.
We propose a novel method of Learning Resampling (termed LeRF) which takes advantage of both the structural priors learned by DNNs and the locally continuous assumption.
LeRF assigns spatially varying resampling functions to input image pixels and learns to predict the shapes of these resampling functions with a neural network.
arXiv Detail & Related papers (2024-07-13T16:09:45Z) - StreamFlow: Streamlined Multi-Frame Optical Flow Estimation for Video
Sequences [31.210626775505407]
Occlusions between consecutive frames have long posed a significant challenge in optical flow estimation.
We present a Streamlined In-batch Multi-frame (SIM) pipeline tailored to video input, attaining a similar level of time efficiency to two-frame networks.
StreamFlow not only excels in terms of performance on challenging KITTI and Sintel datasets, with particular improvement in occluded areas.
arXiv Detail & Related papers (2023-11-28T07:53:51Z) - Deep Multi-Threshold Spiking-UNet for Image Processing [51.88730892920031]
This paper introduces the novel concept of Spiking-UNet for image processing, which combines the power of Spiking Neural Networks (SNNs) with the U-Net architecture.
To achieve an efficient Spiking-UNet, we face two primary challenges: ensuring high-fidelity information propagation through the network via spikes and formulating an effective training strategy.
Experimental results show that, on image segmentation and denoising, our Spiking-UNet achieves comparable performance to its non-spiking counterpart.
arXiv Detail & Related papers (2023-07-20T16:00:19Z) - LOGO-Former: Local-Global Spatio-Temporal Transformer for Dynamic Facial
Expression Recognition [19.5702895176141]
Previous methods for facial expression recognition (DFER) in the wild are mainly based on Convolutional Neural Networks (CNNs), whose local operations ignore the long-range dependencies in videos.
We propose Transformer-based methods for DFER to achieve better performances but result in higher FLOPs and computational costs.
Experiments on two in-the-wild dynamic facial expression datasets (i.e., DFEW and FERV39K) indicate that our method provides an effective way to make use of the spatial and temporal dependencies for DFER.
arXiv Detail & Related papers (2023-05-05T07:53:13Z) - EM-driven unsupervised learning for efficient motion segmentation [3.5232234532568376]
This paper presents a CNN-based fully unsupervised method for motion segmentation from optical flow.
We use the Expectation-Maximization (EM) framework to leverage the loss function and the training procedure of our motion segmentation neural network.
Our method outperforms comparable unsupervised methods and is very efficient.
arXiv Detail & Related papers (2022-01-06T14:35:45Z) - Wasserstein Flow Meets Replicator Dynamics: A Mean-Field Analysis of Representation Learning in Actor-Critic [137.04558017227583]
Actor-critic (AC) algorithms, empowered by neural networks, have had significant empirical success in recent years.
We take a mean-field perspective on the evolution and convergence of feature-based neural AC.
We prove that neural AC finds the globally optimal policy at a sublinear rate.
arXiv Detail & Related papers (2021-12-27T06:09:50Z) - KORSAL: Key-point Detection based Online Real-Time Spatio-Temporal
Action Localization [0.9507070656654633]
Real-time and online action localization in a video is a critical yet highly challenging problem.
Recent attempts achieve this by using computationally intensive 3D CNN architectures or highly redundant two-stream architectures with optical flow.
We propose utilizing fast and efficient key-point based bounding box prediction to spatially localize actions.
Our model achieves a frame rate of 41.8 FPS, which is a 10.7% improvement over contemporary real-time methods.
arXiv Detail & Related papers (2021-11-05T08:39:36Z) - Neural Scene Flow Prior [30.878829330230797]
Before the deep learning revolution, many perception algorithms were based on runtime optimization in conjunction with a strong prior/regularization penalty.
This paper revisits the scene flow problem that relies predominantly on runtime optimization and strong regularization.
A central innovation here is the inclusion of a neural scene flow prior, which uses the architecture of neural networks as a new type of implicit regularizer.
arXiv Detail & Related papers (2021-11-01T20:44:12Z) - Optical-Flow-Reuse-Based Bidirectional Recurrent Network for Space-Time
Video Super-Resolution [52.899234731501075]
Space-time video super-resolution (ST-VSR) simultaneously increases the spatial resolution and frame rate for a given video.
Existing methods typically suffer from difficulties in how to efficiently leverage information from a large range of neighboring frames.
We propose a coarse-to-fine bidirectional recurrent neural network instead of using ConvLSTM to leverage knowledge between adjacent frames.
arXiv Detail & Related papers (2021-10-13T15:21:30Z) - End-to-end Learning for Inter-Vehicle Distance and Relative Velocity
Estimation in ADAS with a Monocular Camera [81.66569124029313]
We propose a camera-based inter-vehicle distance and relative velocity estimation method based on end-to-end training of a deep neural network.
The key novelty of our method is the integration of multiple visual clues provided by any two time-consecutive monocular frames.
We also propose a vehicle-centric sampling mechanism to alleviate the effect of perspective distortion in the motion field.
arXiv Detail & Related papers (2020-06-07T08:18:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.