Forecasting Future Instance Segmentation with Learned Optical Flow and
Warping
- URL: http://arxiv.org/abs/2211.08049v2
- Date: Wed, 6 Sep 2023 14:13:54 GMT
- Title: Forecasting Future Instance Segmentation with Learned Optical Flow and
Warping
- Authors: Andrea Ciamarra, Federico Becattini, Lorenzo Seidenari, Alberto Del
Bimbo
- Abstract summary: In this paper we investigate the usage of optical flow for predicting future semantic segmentations.
Results on the Cityscapes dataset demonstrate the effectiveness of optical-flow methods.
- Score: 31.879514593973195
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: For an autonomous vehicle it is essential to observe the ongoing dynamics of
a scene and consequently predict imminent future scenarios to ensure safety to
itself and others. This can be done using different sensors and modalities. In
this paper we investigate the usage of optical flow for predicting future
semantic segmentations. To do so we propose a model that forecasts flow fields
autoregressively. Such predictions are then used to guide the inference of a
learned warping function that moves instance segmentations on to future frames.
Results on the Cityscapes dataset demonstrate the effectiveness of optical-flow
methods.
Related papers
- Flow-guided Motion Prediction with Semantics and Dynamic Occupancy Grid Maps [5.9803668726235575]
Occupancy Grid Maps (OGMs) are commonly employed for scene prediction.
Recent studies have successfully combined OGMs with deep learning methods to predict the evolution of scene.
We propose a novel multi-task framework that leverages dynamic OGMs and semantic information to predict both future vehicle semantic grids and the future flow of the scene.
arXiv Detail & Related papers (2024-07-22T14:42:34Z) - AdaOcc: Adaptive Forward View Transformation and Flow Modeling for 3D Occupancy and Flow Prediction [56.72301849123049]
We present our solution for the Vision-Centric 3D Occupancy and Flow Prediction track in the nuScenes Open-Occ dataset challenge at CVPR 2024.
Our innovative approach involves a dual-stage framework that enhances 3D occupancy and flow predictions by incorporating adaptive forward view transformation and flow modeling.
Our method combines regression with classification to address scale variations in different scenes, and leverages predicted flow to warp current voxel features to future frames, guided by future frame ground truth.
arXiv Detail & Related papers (2024-07-01T16:32:15Z) - Enhancing End-to-End Autonomous Driving with Latent World Model [78.22157677787239]
We propose a novel self-supervised method to enhance end-to-end driving without the need for costly labels.
Our framework textbfLAW uses a LAtent World model to predict future latent features based on the predicted ego actions and the latent feature of the current frame.
As a result, our approach achieves state-of-the-art performance in both open-loop and closed-loop benchmarks without costly annotations.
arXiv Detail & Related papers (2024-06-12T17:59:21Z) - OFMPNet: Deep End-to-End Model for Occupancy and Flow Prediction in Urban Environment [0.0]
We introduce an end-to-end neural network methodology designed to predict the future behaviors of all dynamic objects in the environment.
We propose a novel time-weighted motion flow loss, whose application has shown a substantial decrease in end-point error.
arXiv Detail & Related papers (2024-04-02T19:37:58Z) - FLODCAST: Flow and Depth Forecasting via Multimodal Recurrent
Architectures [31.879514593973195]
We propose a flow and depth forecasting model, trained to jointly forecast both modalities at once.
We train the proposed model to also perform predictions for several timesteps in the future.
We report benefits on the downstream task of segmentation forecasting, injecting our predictions in a flow-based mask-warping framework.
arXiv Detail & Related papers (2023-10-31T16:30:16Z) - Inverse Dynamics Pretraining Learns Good Representations for Multitask
Imitation [66.86987509942607]
We evaluate how such a paradigm should be done in imitation learning.
We consider a setting where the pretraining corpus consists of multitask demonstrations.
We argue that inverse dynamics modeling is well-suited to this setting.
arXiv Detail & Related papers (2023-05-26T14:40:46Z) - Stochastic Trajectory Prediction via Motion Indeterminacy Diffusion [88.45326906116165]
We present a new framework to formulate the trajectory prediction task as a reverse process of motion indeterminacy diffusion (MID)
We encode the history behavior information and the social interactions as a state embedding and devise a Transformer-based diffusion model to capture the temporal dependencies of trajectories.
Experiments on the human trajectory prediction benchmarks including the Stanford Drone and ETH/UCY datasets demonstrate the superiority of our method.
arXiv Detail & Related papers (2022-03-25T16:59:08Z) - Panoptic Segmentation Forecasting [71.75275164959953]
Our goal is to forecast the near future given a set of recent observations.
We think this ability to forecast, i.e., to anticipate, is integral for the success of autonomous agents.
We develop a two-component model: one component learns the dynamics of the background stuff by anticipating odometry, the other one anticipates the dynamics of detected things.
arXiv Detail & Related papers (2021-04-08T17:59:16Z) - LaPred: Lane-Aware Prediction of Multi-Modal Future Trajectories of
Dynamic Agents [10.869902339190949]
We propose a novel prediction model, referred to as the lane-aware prediction (LaPred) network.
LaPred uses the instance-level lane entities extracted from a semantic map to predict the multi-modal future trajectories.
The experiments conducted on the public nuScenes and Argoverse dataset demonstrate that the proposed LaPred method significantly outperforms the existing prediction models.
arXiv Detail & Related papers (2021-04-01T04:33:36Z) - Implicit Latent Variable Model for Scene-Consistent Motion Forecasting [78.74510891099395]
In this paper, we aim to learn scene-consistent motion forecasts of complex urban traffic directly from sensor data.
We model the scene as an interaction graph and employ powerful graph neural networks to learn a distributed latent representation of the scene.
arXiv Detail & Related papers (2020-07-23T14:31:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.