Related papers: Towards Accurate Vehicle Behaviour Classification With Multi-Relational Graph Convolutional Networks

Towards Accurate Vehicle Behaviour Classification With Multi-Relational Graph Convolutional Networks

URL: http://arxiv.org/abs/2002.00786v3
Date: Tue, 12 May 2020 17:49:11 GMT
Title: Towards Accurate Vehicle Behaviour Classification With Multi-Relational Graph Convolutional Networks
Authors: Sravan Mylavarapu, Mahtab Sandhu, Priyesh Vijayan, K Madhava Krishna, Balaraman Ravindran, Anoop Namboodiri
Abstract summary: We propose a pipeline for understanding vehicle behaviour from a monocular image sequence or video. A temporal sequence of such encodings is fed to a recurrent network to label vehicle behaviours. The proposed framework can classify a variety of vehicle behaviours to high fidelity on datasets that are diverse.
Score: 22.022759283770377
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Understanding on-road vehicle behaviour from a temporal sequence of sensor data is gaining in popularity. In this paper, we propose a pipeline for understanding vehicle behaviour from a monocular image sequence or video. A monocular sequence along with scene semantics, optical flow and object labels are used to get spatial information about the object (vehicle) of interest and other objects (semantically contiguous set of locations) in the scene. This spatial information is encoded by a Multi-Relational Graph Convolutional Network (MR-GCN), and a temporal sequence of such encodings is fed to a recurrent network to label vehicle behaviours. The proposed framework can classify a variety of vehicle behaviours to high fidelity on datasets that are diverse and include European, Chinese and Indian on-road scenes. The framework also provides for seamless transfer of models across datasets without entailing re-annotation, retraining and even fine-tuning. We show comparative performance gain over baseline Spatio-temporal classifiers and detail a variety of ablations to showcase the efficacy of the framework.

Related papers

MTGS: Multi-Traversal Gaussian Splatting [51.22657444433942]
Multi-traversal data provides multiple viewpoints for scene reconstruction within a road block. We propose Multi-Traversal Gaussian Splatting (MTGS), a novel approach that reconstructs high-quality driving scenes from arbitrarily collected multi-traversal data. Our results demonstrate that MTGS improves LPIPS by 23.5% and geometry accuracy by 46.3% compared to single-traversal baselines.
arXiv Detail & Related papers (2025-03-16T15:46:12Z)
Interaction Dataset of Autonomous Vehicles with Traffic Lights and Signs [11.127555705122283]
This paper presents the development of a comprehensive dataset capturing interactions between Autonomous Vehicles (AVs) and traffic control devices, specifically traffic lights and stop signs. Our work addresses a critical gap in the existing literature by providing real-world trajectory data on how AVs navigate these traffic control devices. We propose a methodology for identifying and extracting relevant interaction trajectory data from the Motion dataset, incorporating over 37,000 instances with traffic lights and 44,000 with stop signs.
arXiv Detail & Related papers (2025-01-21T22:59:50Z)
GITSR: Graph Interaction Transformer-based Scene Representation for Multi Vehicle Collaborative Decision-making [9.910230703889956]
This study focuses on efficient scene representation and the modeling of spatial interaction behaviors of traffic states. In this study, we propose GITSR, an effective framework for Graph Interaction Transformer-based Scene Representation.
arXiv Detail & Related papers (2024-11-03T15:27:26Z)
Encoding Agent Trajectories as Representations with Sequence Transformers [0.4999814847776097]
We propose a model for representing high dimensional trajectories with neural-based network architecture. Similar to language models, our Transformer Sequence for Agent temporal Representations (STARE) model can learn representations and structure in trajectory data. We present experimental results on various synthetic and real trajectory datasets and show that our proposed model can learn meaningful encodings.
arXiv Detail & Related papers (2024-10-11T19:18:47Z)
Neural Semantic Map-Learning for Autonomous Vehicles [85.8425492858912]
We present a mapping system that fuses local submaps gathered from a fleet of vehicles at a central instance to produce a coherent map of the road environment. Our method jointly aligns and merges the noisy and incomplete local submaps using a scene-specific Neural Signed Distance Field. We leverage memory-efficient sparse feature-grids to scale to large areas and introduce a confidence score to model uncertainty in scene reconstruction.
arXiv Detail & Related papers (2024-10-10T10:10:03Z)
Temporal Embeddings: Scalable Self-Supervised Temporal Representation Learning from Spatiotemporal Data for Multimodal Computer Vision [1.4127889233510498]
A novel approach is proposed to stratify landscape based on mobility activity time series. The pixel-wise embeddings are converted to image-like channels that can be used for task-based, multimodal modeling.
arXiv Detail & Related papers (2023-10-16T02:53:29Z)
Traffic Scene Parsing through the TSP6K Dataset [109.69836680564616]
We introduce a specialized traffic monitoring dataset, termed TSP6K, with high-quality pixel-level and instance-level annotations. The dataset captures more crowded traffic scenes with several times more traffic participants than the existing driving scenes. We propose a detail refining decoder for scene parsing, which recovers the details of different semantic regions in traffic scenes.
arXiv Detail & Related papers (2023-03-06T02:05:14Z)
Self Supervised Clustering of Traffic Scenes using Graph Representations [2.658812114255374]
We present a data-driven method to cluster traffic scenes that is self-supervised, i.e. without manual labelling. We leverage the semantic scene graph model to create a generic graph embedding of the traffic scene, which is then mapped to a low-dimensional embedding space using a Siamese network. In the training process of our novel approach, we augment existing traffic scenes in the Cartesian space to generate positive similarity samples.
arXiv Detail & Related papers (2022-11-24T22:52:55Z)
Wide and Narrow: Video Prediction from Context and Motion [54.21624227408727]
We propose a new framework to integrate these complementary attributes to predict complex pixel dynamics through deep networks. We present global context propagation networks that aggregate the non-local neighboring representations to preserve the contextual information over the past frames. We also devise local filter memory networks that generate adaptive filter kernels by storing the motion of moving objects in the memory.
arXiv Detail & Related papers (2021-10-22T04:35:58Z)
Spatial-Temporal Correlation and Topology Learning for Person Re-Identification in Videos [78.45050529204701]
We propose a novel framework to pursue discriminative and robust representation by modeling cross-scale spatial-temporal correlation. CTL utilizes a CNN backbone and a key-points estimator to extract semantic local features from human body. It explores a context-reinforced topology to construct multi-scale graphs by considering both global contextual information and physical connections of human body.
arXiv Detail & Related papers (2021-04-15T14:32:12Z)
Understanding Dynamic Scenes using Graph Convolution Networks [22.022759283770377]
We present a novel framework to model on-road vehicle behaviors from a sequence of temporally ordered frames as grabbed by a moving camera. We show a seamless transfer of learning to multiple datasets without resorting to fine-tuning. Such behavior prediction methods find immediate relevance in a variety of navigation tasks.
arXiv Detail & Related papers (2020-05-09T13:05:06Z)
VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation [74.56282712099274]
This paper introduces VectorNet, a hierarchical graph neural network that exploits the spatial locality of individual road components represented by vectors. By operating on the vectorized high definition (HD) maps and agent trajectories, we avoid lossy rendering and computationally intensive ConvNet encoding steps. We evaluate VectorNet on our in-house behavior prediction benchmark and the recently released Argoverse forecasting dataset.
arXiv Detail & Related papers (2020-05-08T19:07:03Z)
Parsing-based View-aware Embedding Network for Vehicle Re-Identification [138.11983486734576]
We propose a parsing-based view-aware embedding network (PVEN) to achieve the view-aware feature alignment and enhancement for vehicle ReID. The experiments conducted on three datasets show that our model outperforms state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2020-04-10T13:06:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.