STINet: Spatio-Temporal-Interactive Network for Pedestrian Detection and
Trajectory Prediction
- URL: http://arxiv.org/abs/2005.04255v1
- Date: Fri, 8 May 2020 18:43:01 GMT
- Title: STINet: Spatio-Temporal-Interactive Network for Pedestrian Detection and
Trajectory Prediction
- Authors: Zhishuai Zhang, Jiyang Gao, Junhua Mao, Yukai Liu, Dragomir Anguelov,
Congcong Li
- Abstract summary: We present a novel end-to-end two-stage network: Spatio--Interactive Network (STINet)
In addition to 3D geometry of pedestrians, we model temporal information for each of the pedestrians.
Our method predicts both current and past locations in the first stage, so that each pedestrian can be linked across frames.
- Score: 24.855059537779294
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Detecting pedestrians and predicting future trajectories for them are
critical tasks for numerous applications, such as autonomous driving. Previous
methods either treat the detection and prediction as separate tasks or simply
add a trajectory regression head on top of a detector. In this work, we present
a novel end-to-end two-stage network: Spatio-Temporal-Interactive Network
(STINet). In addition to 3D geometry modeling of pedestrians, we model the
temporal information for each of the pedestrians. To do so, our method predicts
both current and past locations in the first stage, so that each pedestrian can
be linked across frames and the comprehensive spatio-temporal information can
be captured in the second stage. Also, we model the interaction among objects
with an interaction graph, to gather the information among the neighboring
objects. Comprehensive experiments on the Lyft Dataset and the recently
released large-scale Waymo Open Dataset for both object detection and future
trajectory prediction validate the effectiveness of the proposed method. For
the Waymo Open Dataset, we achieve a bird-eyes-view (BEV) detection AP of 80.73
and trajectory prediction average displacement error (ADE) of 33.67cm for
pedestrians, which establish the state-of-the-art for both tasks.
Related papers
- DeTra: A Unified Model for Object Detection and Trajectory Forecasting [68.85128937305697]
Our approach formulates the union of the two tasks as a trajectory refinement problem.
To tackle this unified task, we design a refinement transformer that infers the presence, pose, and multi-modal future behaviors of objects.
In our experiments, we observe that ourmodel outperforms the state-of-the-art on Argoverse 2 Sensor and Open dataset.
arXiv Detail & Related papers (2024-06-06T18:12:04Z) - JRDB-Traj: A Dataset and Benchmark for Trajectory Forecasting in Crowds [79.00975648564483]
Trajectory forecasting models, employed in fields such as robotics, autonomous vehicles, and navigation, face challenges in real-world scenarios.
This dataset provides comprehensive data, including the locations of all agents, scene images, and point clouds, all from the robot's perspective.
The objective is to predict the future positions of agents relative to the robot using raw sensory input data.
arXiv Detail & Related papers (2023-11-05T18:59:31Z) - Implicit Occupancy Flow Fields for Perception and Prediction in
Self-Driving [68.95178518732965]
A self-driving vehicle (SDV) must be able to perceive its surroundings and predict the future behavior of other traffic participants.
Existing works either perform object detection followed by trajectory of the detected objects, or predict dense occupancy and flow grids for the whole scene.
This motivates our unified approach to perception and future prediction that implicitly represents occupancy and flow over time with a single neural network.
arXiv Detail & Related papers (2023-08-02T23:39:24Z) - SAPI: Surroundings-Aware Vehicle Trajectory Prediction at Intersections [4.982485708779067]
SAPI is a deep learning model to predict vehicle trajectories at intersections.
The proposed model consists of two convolutional network (CNN) and recurrent neural network (RNN)-based encoders and one decoder.
We evaluate SAPI on a proprietary dataset collected in real-world intersections through autonomous vehicles.
arXiv Detail & Related papers (2023-06-02T07:10:45Z) - Cross-Camera Trajectories Help Person Retrieval in a Camera Network [124.65912458467643]
Existing methods often rely on purely visual matching or consider temporal constraints but ignore the spatial information of the camera network.
We propose a pedestrian retrieval framework based on cross-camera generation, which integrates both temporal and spatial information.
To verify the effectiveness of our method, we construct the first cross-camera pedestrian trajectory dataset.
arXiv Detail & Related papers (2022-04-27T13:10:48Z) - Exploring Simple 3D Multi-Object Tracking for Autonomous Driving [10.921208239968827]
3D multi-object tracking in LiDAR point clouds is a key ingredient for self-driving vehicles.
Existing methods are predominantly based on the tracking-by-detection pipeline and inevitably require a matching step for the detection association.
We present SimTrack to simplify the hand-crafted tracking paradigm by proposing an end-to-end trainable model for joint detection and tracking from raw point clouds.
arXiv Detail & Related papers (2021-08-23T17:59:22Z) - Stepwise Goal-Driven Networks for Trajectory Prediction [24.129731432223416]
We propose to predict the future trajectories of observed agents by estimating and using their goals at multiple time scales.
We present a novel recurrent network for trajectory prediction, called Stepwise Goal-Driven Network (SGNet)
In particular, the framework incorporates an encoder module that captures historical information, a stepwise goal estimator that predicts successive goals into the future, and a decoder module that predicts future trajectory.
arXiv Detail & Related papers (2021-03-25T19:51:54Z) - PePScenes: A Novel Dataset and Baseline for Pedestrian Action Prediction
in 3D [10.580548257913843]
We propose a new pedestrian action prediction dataset created by adding per-frame 2D/3D bounding box and behavioral annotations to nuScenes.
In addition, we propose a hybrid neural network architecture that incorporates various data modalities for predicting pedestrian crossing action.
arXiv Detail & Related papers (2020-12-14T18:13:44Z) - PnPNet: End-to-End Perception and Prediction with Tracking in the Loop [82.97006521937101]
We tackle the problem of joint perception and motion forecasting in the context of self-driving vehicles.
We propose Net, an end-to-end model that takes as input sensor data, and outputs at each time step object tracks and their future level.
arXiv Detail & Related papers (2020-05-29T17:57:25Z) - Spatiotemporal Relationship Reasoning for Pedestrian Intent Prediction [57.56466850377598]
Reasoning over visual data is a desirable capability for robotics and vision-based applications.
In this paper, we present a framework on graph to uncover relationships in different objects in the scene for reasoning about pedestrian intent.
Pedestrian intent, defined as the future action of crossing or not-crossing the street, is a very crucial piece of information for autonomous vehicles.
arXiv Detail & Related papers (2020-02-20T18:50:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.