Vision-Guided Forecasting -- Visual Context for Multi-Horizon Time
Series Forecasting
- URL: http://arxiv.org/abs/2107.12674v1
- Date: Tue, 27 Jul 2021 08:52:40 GMT
- Title: Vision-Guided Forecasting -- Visual Context for Multi-Horizon Time
Series Forecasting
- Authors: Eitan Kosman, Dotan Di Castro
- Abstract summary: We tackle multi-horizon forecasting of vehicle states by fusing the two modalities.
We design and experiment with 3D convolutions for visual features extraction and 1D convolutions for features extraction from speed and steering angle traces.
We show that we are able to forecast a vehicle's state to various horizons, while outperforming the current state-of-the-art results on the related task of driving state estimation.
- Score: 0.6947442090579469
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Autonomous driving gained huge traction in recent years, due to its potential
to change the way we commute. Much effort has been put into trying to estimate
the state of a vehicle. Meanwhile, learning to forecast the state of a vehicle
ahead introduces new capabilities, such as predicting dangerous situations.
Moreover, forecasting brings new supervision opportunities by learning to
predict richer a context, expressed by multiple horizons. Intuitively, a video
stream originated from a front-facing camera is necessary because it encodes
information about the upcoming road. Besides, historical traces of the
vehicle's states give more context. In this paper, we tackle multi-horizon
forecasting of vehicle states by fusing the two modalities. We design and
experiment with 3 end-to-end architectures that exploit 3D convolutions for
visual features extraction and 1D convolutions for features extraction from
speed and steering angle traces. To demonstrate the effectiveness of our
method, we perform extensive experiments on two publicly available real-world
datasets, Comma2k19 and the Udacity challenge. We show that we are able to
forecast a vehicle's state to various horizons, while outperforming the current
state-of-the-art results on the related task of driving state estimation. We
examine the contribution of vision features, and find that a model fed with
vision features achieves an error that is 56.6% and 66.9% of the error of a
model that doesn't use those features, on the Udacity and Comma2k19 datasets
respectively.
Related papers
- Cohere3D: Exploiting Temporal Coherence for Unsupervised Representation
Learning of Vision-based Autonomous Driving [73.3702076688159]
We propose a novel contrastive learning algorithm, Cohere3D, to learn coherent instance representations in a long-term input sequence.
We evaluate our algorithm by finetuning the pretrained model on various downstream perception, prediction, and planning tasks.
arXiv Detail & Related papers (2024-02-23T19:43:01Z) - Visual Point Cloud Forecasting enables Scalable Autonomous Driving [28.376086570498952]
Visual autonomous driving applications require features encompassing semantics, 3D geometry, and temporal information simultaneously.
We present ViDAR, a general model to pre-train downstream visual encoders.
Experiments show significant gain in downstream tasks, e.g., 3.1% NDS on 3D detection, 10% error reduction on motion forecasting, and 15% less collision rate on planning.
arXiv Detail & Related papers (2023-12-29T15:44:13Z) - DeepAccident: A Motion and Accident Prediction Benchmark for V2X
Autonomous Driving [76.29141888408265]
We propose a large-scale dataset containing diverse accident scenarios that frequently occur in real-world driving.
The proposed DeepAccident dataset includes 57K annotated frames and 285K annotated samples, approximately 7 times more than the large-scale nuScenes dataset.
arXiv Detail & Related papers (2023-04-03T17:37:00Z) - Policy Pre-training for End-to-end Autonomous Driving via
Self-supervised Geometric Modeling [96.31941517446859]
We propose PPGeo (Policy Pre-training via Geometric modeling), an intuitive and straightforward fully self-supervised framework curated for the policy pretraining in visuomotor driving.
We aim at learning policy representations as a powerful abstraction by modeling 3D geometric scenes on large-scale unlabeled and uncalibrated YouTube driving videos.
In the first stage, the geometric modeling framework generates pose and depth predictions simultaneously, with two consecutive frames as input.
In the second stage, the visual encoder learns driving policy representation by predicting the future ego-motion and optimizing with the photometric error based on current visual observation only.
arXiv Detail & Related papers (2023-01-03T08:52:49Z) - ST-P3: End-to-end Vision-based Autonomous Driving via Spatial-Temporal
Feature Learning [132.20119288212376]
We propose a spatial-temporal feature learning scheme towards a set of more representative features for perception, prediction and planning tasks simultaneously.
To the best of our knowledge, we are the first to systematically investigate each part of an interpretable end-to-end vision-based autonomous driving system.
arXiv Detail & Related papers (2022-07-15T16:57:43Z) - Safety-aware Motion Prediction with Unseen Vehicles for Autonomous
Driving [104.32241082170044]
We study a new task, safety-aware motion prediction with unseen vehicles for autonomous driving.
Unlike the existing trajectory prediction task for seen vehicles, we aim at predicting an occupancy map.
Our approach is the first one that can predict the existence of unseen vehicles in most cases.
arXiv Detail & Related papers (2021-09-03T13:33:33Z) - Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data
Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images.
Our approach is fully automatic without any human interaction.
We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z) - Vehicle Trajectory Prediction in Crowded Highway Scenarios Using Bird
Eye View Representations and CNNs [0.0]
This paper describes a novel approach to perform vehicle trajectory predictions employing graphic representations.
The problem is faced as an image to image regression problem training the network to learn the underlying relations between the traffic participants.
The model has been tested in highway scenarios with more than 30 vehicles simultaneously in two opposite traffic flow streams.
arXiv Detail & Related papers (2020-08-26T11:15:49Z) - Two-Stream Networks for Lane-Change Prediction of Surrounding Vehicles [8.828423067460644]
In highway scenarios, an alert human driver will typically anticipate early cut-in and cut-out maneuvers surrounding vehicles using only visual cues.
To deal with lane-change recognition and prediction of surrounding vehicles, we pose the problem as an action recognition/prediction problem by stacking visual cues from video cameras.
Two video action recognition approaches are analyzed: two-stream convolutional networks and multiplier networks.
arXiv Detail & Related papers (2020-08-25T07:59:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.