Bidirectional Progressive Transformer for Interaction Intention Anticipation
- URL: http://arxiv.org/abs/2405.05552v1
- Date: Thu, 9 May 2024 05:22:18 GMT
- Title: Bidirectional Progressive Transformer for Interaction Intention Anticipation
- Authors: Zichen Zhang, Hongchen Luo, Wei Zhai, Yang Cao, Yu Kang,
- Abstract summary: We introduce a Bidirectional Progressive mechanism into the anticipation of interaction intention.
We employ a Trajectory Unit and a C-VAE to introduce appropriate uncertainty to trajectories and interaction hotspots.
Our method achieves state-of-the-art results on three benchmark datasets.
- Score: 20.53329698350243
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Interaction intention anticipation aims to jointly predict future hand trajectories and interaction hotspots. Existing research often treated trajectory forecasting and interaction hotspots prediction as separate tasks or solely considered the impact of trajectories on interaction hotspots, which led to the accumulation of prediction errors over time. However, a deeper inherent connection exists between hand trajectories and interaction hotspots, which allows for continuous mutual correction between them. Building upon this relationship, a novel Bidirectional prOgressive Transformer (BOT), which introduces a Bidirectional Progressive mechanism into the anticipation of interaction intention is established. Initially, BOT maximizes the utilization of spatial information from the last observation frame through the Spatial-Temporal Reconstruction Module, mitigating conflicts arising from changes of view in first-person videos. Subsequently, based on two independent prediction branches, a Bidirectional Progressive Enhancement Module is introduced to mutually improve the prediction of hand trajectories and interaction hotspots over time to minimize error accumulation. Finally, acknowledging the intrinsic randomness in human natural behavior, we employ a Trajectory Stochastic Unit and a C-VAE to introduce appropriate uncertainty to trajectories and interaction hotspots, respectively. Our method achieves state-of-the-art results on three benchmark datasets Epic-Kitchens-100, EGO4D, and EGTEA Gaze+, demonstrating superior in complex scenarios.
Related papers
- AMP: Autoregressive Motion Prediction Revisited with Next Token Prediction for Autonomous Driving [59.94343412438211]
We introduce the GPT style next token motion prediction into motion prediction.
Different from language data which is composed of homogeneous units -words, the elements in the driving scene could have complex spatial-temporal and semantic relations.
We propose to adopt three factorized attention modules with different neighbors for information aggregation and different position encoding styles to capture their relations.
arXiv Detail & Related papers (2024-03-20T06:22:37Z) - SSL-Interactions: Pretext Tasks for Interactive Trajectory Prediction [4.286256266868156]
We present SSL-Interactions that proposes pretext tasks to enhance interaction modeling for trajectory prediction.
We introduce four interaction-aware pretext tasks to encapsulate various aspects of agent interactions.
We also propose an approach to curate interaction-heavy scenarios from datasets.
arXiv Detail & Related papers (2024-01-15T14:43:40Z) - PPAD: Iterative Interactions of Prediction and Planning for End-to-end Autonomous Driving [57.89801036693292]
PPAD (Iterative Interaction of Prediction and Planning Autonomous Driving) considers the timestep-wise interaction to better integrate prediction and planning.
We design ego-to-agent, ego-to-map, and ego-to-BEV interaction mechanisms with hierarchical dynamic key objects attention to better model the interactions.
arXiv Detail & Related papers (2023-11-14T11:53:24Z) - FFINet: Future Feedback Interaction Network for Motion Forecasting [46.247396728154904]
We propose a novel Future Feedback Interaction Network (FFINet) to aggregate features the current observations and potential future interactions for trajectory prediction.
Our FFINet achieves the state-of-the-art performance on Argoverse 1 and Argoverse 2 motion forecasting benchmarks.
arXiv Detail & Related papers (2023-11-08T07:57:29Z) - A Hierarchical Hybrid Learning Framework for Multi-agent Trajectory
Prediction [4.181632607997678]
We propose a hierarchical hybrid framework of deep learning (DL) and reinforcement learning (RL) for multi-agent trajectory prediction.
In the DL stage, the traffic scene is divided into multiple intermediate-scale heterogenous graphs based on which Transformer-style GNNs are adopted to encode heterogenous interactions.
In the RL stage, we divide the traffic scene into local sub-scenes utilizing the key future points predicted in the DL stage.
arXiv Detail & Related papers (2023-03-22T02:47:42Z) - ProspectNet: Weighted Conditional Attention for Future Interaction
Modeling in Behavior Prediction [5.520507323174275]
We formulate the end-to-end joint prediction problem as a sequential learning process of marginal learning and joint learning of vehicle behaviors.
We propose ProspectNet, a joint learning block that adopts the weighted attention score to model the mutual influence between interactive agent pairs.
We show that ProspectNet outperforms the Cartesian product of two marginal predictions, and achieves comparable performance on the Interactive Motion Prediction benchmarks.
arXiv Detail & Related papers (2022-08-29T19:29:49Z) - SGCN:Sparse Graph Convolution Network for Pedestrian Trajectory
Prediction [64.16212996247943]
We present a Sparse Graph Convolution Network(SGCN) for pedestrian trajectory prediction.
Specifically, the SGCN explicitly models the sparse directed interaction with a sparse directed spatial graph to capture adaptive interaction pedestrians.
visualizations indicate that our method can capture adaptive interactions between pedestrians and their effective motion tendencies.
arXiv Detail & Related papers (2021-04-04T03:17:42Z) - End-to-end Contextual Perception and Prediction with Interaction
Transformer [79.14001602890417]
We tackle the problem of detecting objects in 3D and forecasting their future motion in the context of self-driving.
To capture their spatial-temporal dependencies, we propose a recurrent neural network with a novel Transformer architecture.
Our model can be trained end-to-end, and runs in real-time.
arXiv Detail & Related papers (2020-08-13T14:30:12Z) - A Spatial-Temporal Attentive Network with Spatial Continuity for
Trajectory Prediction [74.00750936752418]
We propose a novel model named spatial-temporal attentive network with spatial continuity (STAN-SC)
First, spatial-temporal attention mechanism is presented to explore the most useful and important information.
Second, we conduct a joint feature sequence based on the sequence and instant state information to make the generative trajectories keep spatial continuity.
arXiv Detail & Related papers (2020-03-13T04:35:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.