Towards Accurate Vehicle Behaviour Classification With Multi-Relational
Graph Convolutional Networks
- URL: http://arxiv.org/abs/2002.00786v3
- Date: Tue, 12 May 2020 17:49:11 GMT
- Title: Towards Accurate Vehicle Behaviour Classification With Multi-Relational
Graph Convolutional Networks
- Authors: Sravan Mylavarapu, Mahtab Sandhu, Priyesh Vijayan, K Madhava Krishna,
Balaraman Ravindran, Anoop Namboodiri
- Abstract summary: We propose a pipeline for understanding vehicle behaviour from a monocular image sequence or video.
A temporal sequence of such encodings is fed to a recurrent network to label vehicle behaviours.
The proposed framework can classify a variety of vehicle behaviours to high fidelity on datasets that are diverse.
- Score: 22.022759283770377
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Understanding on-road vehicle behaviour from a temporal sequence of sensor
data is gaining in popularity. In this paper, we propose a pipeline for
understanding vehicle behaviour from a monocular image sequence or video. A
monocular sequence along with scene semantics, optical flow and object labels
are used to get spatial information about the object (vehicle) of interest and
other objects (semantically contiguous set of locations) in the scene. This
spatial information is encoded by a Multi-Relational Graph Convolutional
Network (MR-GCN), and a temporal sequence of such encodings is fed to a
recurrent network to label vehicle behaviours. The proposed framework can
classify a variety of vehicle behaviours to high fidelity on datasets that are
diverse and include European, Chinese and Indian on-road scenes. The framework
also provides for seamless transfer of models across datasets without entailing
re-annotation, retraining and even fine-tuning. We show comparative performance
gain over baseline Spatio-temporal classifiers and detail a variety of
ablations to showcase the efficacy of the framework.
Related papers
- GITSR: Graph Interaction Transformer-based Scene Representation for Multi Vehicle Collaborative Decision-making [9.910230703889956]
This study focuses on efficient scene representation and the modeling of spatial interaction behaviors of traffic states.
In this study, we propose GITSR, an effective framework for Graph Interaction Transformer-based Scene Representation.
arXiv Detail & Related papers (2024-11-03T15:27:26Z) - Encoding Agent Trajectories as Representations with Sequence Transformers [0.4999814847776097]
We propose a model for representing high dimensional trajectories with neural-based network architecture.
Similar to language models, our Transformer Sequence for Agent temporal Representations (STARE) model can learn representations and structure in trajectory data.
We present experimental results on various synthetic and real trajectory datasets and show that our proposed model can learn meaningful encodings.
arXiv Detail & Related papers (2024-10-11T19:18:47Z) - Neural Semantic Map-Learning for Autonomous Vehicles [85.8425492858912]
We present a mapping system that fuses local submaps gathered from a fleet of vehicles at a central instance to produce a coherent map of the road environment.
Our method jointly aligns and merges the noisy and incomplete local submaps using a scene-specific Neural Signed Distance Field.
We leverage memory-efficient sparse feature-grids to scale to large areas and introduce a confidence score to model uncertainty in scene reconstruction.
arXiv Detail & Related papers (2024-10-10T10:10:03Z) - Temporal Embeddings: Scalable Self-Supervised Temporal Representation
Learning from Spatiotemporal Data for Multimodal Computer Vision [1.4127889233510498]
A novel approach is proposed to stratify landscape based on mobility activity time series.
The pixel-wise embeddings are converted to image-like channels that can be used for task-based, multimodal modeling.
arXiv Detail & Related papers (2023-10-16T02:53:29Z) - Traffic Scene Parsing through the TSP6K Dataset [109.69836680564616]
We introduce a specialized traffic monitoring dataset, termed TSP6K, with high-quality pixel-level and instance-level annotations.
The dataset captures more crowded traffic scenes with several times more traffic participants than the existing driving scenes.
We propose a detail refining decoder for scene parsing, which recovers the details of different semantic regions in traffic scenes.
arXiv Detail & Related papers (2023-03-06T02:05:14Z) - Self Supervised Clustering of Traffic Scenes using Graph Representations [2.658812114255374]
We present a data-driven method to cluster traffic scenes that is self-supervised, i.e. without manual labelling.
We leverage the semantic scene graph model to create a generic graph embedding of the traffic scene, which is then mapped to a low-dimensional embedding space using a Siamese network.
In the training process of our novel approach, we augment existing traffic scenes in the Cartesian space to generate positive similarity samples.
arXiv Detail & Related papers (2022-11-24T22:52:55Z) - Wide and Narrow: Video Prediction from Context and Motion [54.21624227408727]
We propose a new framework to integrate these complementary attributes to predict complex pixel dynamics through deep networks.
We present global context propagation networks that aggregate the non-local neighboring representations to preserve the contextual information over the past frames.
We also devise local filter memory networks that generate adaptive filter kernels by storing the motion of moving objects in the memory.
arXiv Detail & Related papers (2021-10-22T04:35:58Z) - Spatial-Temporal Correlation and Topology Learning for Person
Re-Identification in Videos [78.45050529204701]
We propose a novel framework to pursue discriminative and robust representation by modeling cross-scale spatial-temporal correlation.
CTL utilizes a CNN backbone and a key-points estimator to extract semantic local features from human body.
It explores a context-reinforced topology to construct multi-scale graphs by considering both global contextual information and physical connections of human body.
arXiv Detail & Related papers (2021-04-15T14:32:12Z) - Understanding Dynamic Scenes using Graph Convolution Networks [22.022759283770377]
We present a novel framework to model on-road vehicle behaviors from a sequence of temporally ordered frames as grabbed by a moving camera.
We show a seamless transfer of learning to multiple datasets without resorting to fine-tuning.
Such behavior prediction methods find immediate relevance in a variety of navigation tasks.
arXiv Detail & Related papers (2020-05-09T13:05:06Z) - VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized
Representation [74.56282712099274]
This paper introduces VectorNet, a hierarchical graph neural network that exploits the spatial locality of individual road components represented by vectors.
By operating on the vectorized high definition (HD) maps and agent trajectories, we avoid lossy rendering and computationally intensive ConvNet encoding steps.
We evaluate VectorNet on our in-house behavior prediction benchmark and the recently released Argoverse forecasting dataset.
arXiv Detail & Related papers (2020-05-08T19:07:03Z) - Parsing-based View-aware Embedding Network for Vehicle Re-Identification [138.11983486734576]
We propose a parsing-based view-aware embedding network (PVEN) to achieve the view-aware feature alignment and enhancement for vehicle ReID.
The experiments conducted on three datasets show that our model outperforms state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2020-04-10T13:06:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.