Related papers: Pishgu: Universal Path Prediction Architecture through Graph Isomorphism and Attentive Convolution

Pishgu: Universal Path Prediction Architecture through Graph Isomorphism and Attentive Convolution

URL: http://arxiv.org/abs/2210.08057v1
Date: Fri, 14 Oct 2022 18:48:48 GMT
Title: Pishgu: Universal Path Prediction Architecture through Graph Isomorphism and Attentive Convolution
Authors: Ghazal Alinezhad Noghre, Vinit Katariya, Armin Danesh Pazho, Christopher Neff, Hamed Tabkhi
Abstract summary: This article proposes Pishgu, a universal graph isomorphism approach for attentive path prediction. Pishgu captures the inter-dependencies within the subjects in each frame by taking advantage of Graph Isomorphism Networks. We evaluate the adaptability of our approach to multiple publicly available vehicle (bird's-eye view) and pedestrian (bird's-eye and high-angle view) path prediction datasets.
Score: 2.6774008509840996
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Path prediction is an essential task for several real-world real-time applications, from autonomous driving and video surveillance to environmental monitoring. Most existing approaches are computation-intensive and only target a narrow domain (e.g., a specific point of view for a particular subject). However, many real-time applications demand a universal path predictor that can work across different subjects (vehicles, pedestrians), perspectives (bird's-eye, high-angle), and scenes (sidewalk, highway). This article proposes Pishgu, a universal graph isomorphism approach for attentive path prediction that accounts for environmental challenges. Pishgu captures the inter-dependencies within the subjects in each frame by taking advantage of Graph Isomorphism Networks. In addition, an attention module is adopted to represent the intrinsic relations of the subjects of interest with their surroundings. We evaluate the adaptability of our approach to multiple publicly available vehicle (bird's-eye view) and pedestrian (bird's-eye and high-angle view) path prediction datasets. Pishgu's universal solution outperforms existing domain-focused methods by producing state-of-the-art results for vehicle bird's-eye view by 42% and 61% and pedestrian high-angle views by 23% and 22% in terms of ADE and FDE, respectively. Moreover, we analyze the domain-specific details for various datasets to understand their effect on path prediction and model interpretation. Although our model is a single solution for path prediction problems and defines a new standard in multiple domains, it still has a comparable complexity to state-of-the-art models, which makes it suitable for real-world application. We also report the latency and throughput for all three domains on multiple embedded processors.

Related papers

Exploiting Aggregation and Segregation of Representations for Domain Adaptive Human Pose Estimation [50.31351006532924]
Human pose estimation (HPE) has received increasing attention recently due to its wide application in motion analysis, virtual reality, healthcare, etc. It suffers from the lack of labeled diverse real-world datasets due to the time- and labor-intensive annotation. We introduce a novel framework that capitalizes on both representation aggregation and segregation for domain adaptive human pose estimation.
arXiv Detail & Related papers (2024-12-29T17:59:45Z)
One Model for One Graph: A New Perspective for Pretraining with Cross-domain Graphs [61.9759512646523]
Graph Neural Networks (GNNs) have emerged as a powerful tool to capture intricate network patterns. Existing GNNs require careful domain-specific architecture designs and training from scratch on each dataset. We propose a novel cross-domain pretraining framework, "one model for one graph"
arXiv Detail & Related papers (2024-11-30T01:49:45Z)
Probing Fine-Grained Action Understanding and Cross-View Generalization of Foundation Models [13.972809192907931]
Foundation models (FMs) are large neural networks trained on broad datasets. Human activity recognition in video has advanced with FMs, driven by competition among different architectures. This paper empirically evaluates how perspective changes affect different FMs in fine-grained human activity recognition.
arXiv Detail & Related papers (2024-07-22T12:59:57Z)
XVTP3D: Cross-view Trajectory Prediction Using Shared 3D Queries for Autonomous Driving [7.616422495497465]
Trajectory prediction with uncertainty is a critical and challenging task for autonomous driving. We present a cross-view trajectory prediction method using shared 3D queries (XVTP3D) The results of experiments on two publicly available datasets show that XVTP3D achieved state-of-the-art performance with consistent cross-view predictions.
arXiv Detail & Related papers (2023-08-17T03:35:13Z)
MultiPath++: Efficient Information Fusion and Trajectory Aggregation for Behavior Prediction [42.563865078323204]
We present MultiPath++, a future prediction model that achieves state-of-the-art performance on popular benchmarks. We show that our proposed model achieves state-of-the-art performance on the Argoverse Motion Forecasting Competition and Open Motion Prediction Challenge.
arXiv Detail & Related papers (2021-11-29T21:36:53Z)
TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks. To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame. Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z)
Detecting 32 Pedestrian Attributes for Autonomous Vehicles [103.87351701138554]
In this paper, we address the problem of jointly detecting pedestrians and recognizing 32 pedestrian attributes. We introduce a Multi-Task Learning (MTL) model relying on a composite field framework, which achieves both goals in an efficient way. We show competitive detection and attribute recognition results, as well as a more stable MTL training.
arXiv Detail & Related papers (2020-12-04T15:10:12Z)
Multi-Modal Hybrid Architecture for Pedestrian Action Prediction [14.032334569498968]
We propose a novel multi-modal prediction algorithm that incorporates different sources of information captured from the environment to predict future crossing actions of pedestrians. Using the existing 2D pedestrian behavior benchmarks and a newly annotated 3D driving dataset, we show that our proposed model achieves state-of-the-art performance in pedestrian crossing prediction.
arXiv Detail & Related papers (2020-11-16T15:17:58Z)
Multi-path Neural Networks for On-device Multi-domain Visual Classification [55.281139434736254]
This paper proposes a novel approach to automatically learn a multi-path network for multi-domain visual classification on mobile devices. The proposed multi-path network is learned from neural architecture search by applying one reinforcement learning controller for each domain to select the best path in the super-network created from a MobileNetV3-like search space. The determined multi-path model selectively shares parameters across domains in shared nodes while keeping domain-specific parameters within non-shared nodes in individual domain paths.
arXiv Detail & Related papers (2020-10-10T05:13:49Z)
Cross-Domain Facial Expression Recognition: A Unified Evaluation Benchmark and Adversarial Graph Learning [85.6386289476598]
We develop a novel adversarial graph representation adaptation (AGRA) framework for cross-domain holistic-local feature co-adaptation. We conduct extensive and fair evaluations on several popular benchmarks and show that the proposed AGRA framework outperforms previous state-of-the-art methods.
arXiv Detail & Related papers (2020-08-03T15:00:31Z)
Adversarial Bipartite Graph Learning for Video Domain Adaptation [50.68420708387015]
Domain adaptation techniques, which focus on adapting models between distributionally different domains, are rarely explored in the video recognition area. Recent works on visual domain adaptation which leverage adversarial learning to unify the source and target video representations are not highly effective on the videos. This paper proposes an Adversarial Bipartite Graph (ABG) learning framework which directly models the source-target interactions.
arXiv Detail & Related papers (2020-07-31T03:48:41Z)
STINet: Spatio-Temporal-Interactive Network for Pedestrian Detection and Trajectory Prediction [24.855059537779294]
We present a novel end-to-end two-stage network: Spatio--Interactive Network (STINet) In addition to 3D geometry of pedestrians, we model temporal information for each of the pedestrians. Our method predicts both current and past locations in the first stage, so that each pedestrian can be linked across frames.
arXiv Detail & Related papers (2020-05-08T18:43:01Z)
Spatiotemporal Relationship Reasoning for Pedestrian Intent Prediction [57.56466850377598]
Reasoning over visual data is a desirable capability for robotics and vision-based applications. In this paper, we present a framework on graph to uncover relationships in different objects in the scene for reasoning about pedestrian intent. Pedestrian intent, defined as the future action of crossing or not-crossing the street, is a very crucial piece of information for autonomous vehicles.
arXiv Detail & Related papers (2020-02-20T18:50:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.