Related papers: GUIDE-CoT: Goal-driven and User-Informed Dynamic Estimation for Pedestrian Trajectory using Chain-of-Thought

GUIDE-CoT: Goal-driven and User-Informed Dynamic Estimation for Pedestrian Trajectory using Chain-of-Thought

URL: http://arxiv.org/abs/2503.06832v1
Date: Mon, 10 Mar 2025 01:39:24 GMT
Title: GUIDE-CoT: Goal-driven and User-Informed Dynamic Estimation for Pedestrian Trajectory using Chain-of-Thought
Authors: Sungsik Kim, Janghyun Baek, Jinkyu Kim, Jaekoo Lee,
Abstract summary: We propose Goal-driven and User-Informed Dynamic Estimation for pedestrian trajectory using Chain-of-Thought (GUIDE-CoT)<n>Our approach integrates two innovative modules: (1) a goal-oriented visual prompt, which enhances goal prediction accuracy combining visual prompts with a pretrained visual encoder, and (2) a chain-of-thought (CoT) LLM for trajectory generation, which generates realistic trajectories toward the predicted goal.<n>Our method achieves state-of-the-art performance, delivering both high accuracy and greater adaptability in pedestrian trajectory prediction.
Score: 9.572859785331307
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While Large Language Models (LLMs) have recently shown impressive results in reasoning tasks, their application to pedestrian trajectory prediction remains challenging due to two key limitations: insufficient use of visual information and the difficulty of predicting entire trajectories. To address these challenges, we propose Goal-driven and User-Informed Dynamic Estimation for pedestrian trajectory using Chain-of-Thought (GUIDE-CoT). Our approach integrates two innovative modules: (1) a goal-oriented visual prompt, which enhances goal prediction accuracy combining visual prompts with a pretrained visual encoder, and (2) a chain-of-thought (CoT) LLM for trajectory generation, which generates realistic trajectories toward the predicted goal. Moreover, our method introduces controllable trajectory generation, allowing for flexible and user-guided modifications to the predicted paths. Through extensive experiments on the ETH/UCY benchmark datasets, our method achieves state-of-the-art performance, delivering both high accuracy and greater adaptability in pedestrian trajectory prediction. Our code is publicly available at https://github.com/ai-kmu/GUIDE-CoT.

Related papers

Foresight in Motion: Reinforcing Trajectory Prediction with Reward Heuristics [34.570579623171476]
"First Reasoning, Then Forecasting" is a strategy that explicitly incorporates behavior intentions as spatial guidance for trajectory prediction.<n>We introduce an interpretable, reward-driven intention reasoner grounded in a novel query-centric Inverse Reinforcement Learning scheme.<n>Our approach significantly enhances trajectory prediction confidence, achieving highly competitive performance relative to state-of-the-art methods.
arXiv Detail & Related papers (2025-07-16T09:46:17Z)
VLN-R1: Vision-Language Navigation via Reinforcement Fine-Tuning [77.34267241692706]
Vision-Language Navigation (VLN) is a core challenge in embodied AI, requiring agents to navigate real-world environments using natural language instructions.<n>We propose VLN-R1, an end-to-end framework that leverages Large Vision-Language Models (LVLM) to directly translate egocentric video streams into continuous navigation actions.
arXiv Detail & Related papers (2025-06-20T17:59:59Z)
Exploring Dynamic Transformer for Efficient Object Tracking [58.120191254379854]
We propose DyTrack, a dynamic transformer framework for efficient tracking. DyTrack automatically learns to configure proper reasoning routes for various inputs, gaining better utilization of the available computational budget. Experiments on multiple benchmarks demonstrate that DyTrack achieves promising speed-precision trade-offs with only a single model.
arXiv Detail & Related papers (2024-03-26T12:31:58Z)
Certified Human Trajectory Prediction [66.1736456453465]
Tray prediction plays an essential role in autonomous vehicles. We propose a certification approach tailored for the task of trajectory prediction. We address the inherent challenges associated with trajectory prediction, including unbounded outputs, and mutli-modality.
arXiv Detail & Related papers (2024-03-20T17:41:35Z)
LG-Traj: LLM Guided Pedestrian Trajectory Prediction [9.385936248154987]
We introduce LG-Traj, a novel approach to generate motion cues present in pedestrian past/observed trajectories. These motion cues, along with pedestrian coordinates, facilitate a better understanding of the underlying representation. Our method employs a transformer-based architecture comprising a motion encoder to model motion patterns and a social decoder to capture social interactions among pedestrians.
arXiv Detail & Related papers (2024-03-12T19:06:23Z)
Towards Safe and Reliable Autonomous Driving: Dynamic Occupancy Set Prediction [12.336412741837407]
This study introduces a novel method for Dynamic Occupancy Set (DOS) prediction, it effectively combines advanced trajectory prediction networks with a DOS prediction module. The innovative contributions of this study include the development of a novel DOS prediction model specifically tailored for navigating complex scenarios.
arXiv Detail & Related papers (2024-02-29T17:36:39Z)
Pre-training on Synthetic Driving Data for Trajectory Prediction [61.520225216107306]
We propose a pipeline-level solution to mitigate the issue of data scarcity in trajectory forecasting. We adopt HD map augmentation and trajectory synthesis for generating driving data, and then we learn representations by pre-training on them. We conduct extensive experiments to demonstrate the effectiveness of our data expansion and pre-training strategies.
arXiv Detail & Related papers (2023-09-18T19:49:22Z)
Towards Unified Token Learning for Vision-Language Tracking [65.96561538356315]
We present a vision-language (VL) tracking pipeline, termed textbfMMTrack, which casts VL tracking as a token generation task. Our proposed framework serializes language description and bounding box into a sequence of discrete tokens. In this new design paradigm, all token queries are required to perceive the desired target and directly predict spatial coordinates of the target.
arXiv Detail & Related papers (2023-08-27T13:17:34Z)
An End-to-End Framework of Road User Detection, Tracking, and Prediction from Monocular Images [11.733622044569486]
We build an end-to-end framework for detection, tracking, and trajectory prediction called ODTP. It adopts the state-of-the-art online multi-object tracking model, QD-3DT, for perception and trains the trajectory predictor, DCENet++, directly based on the detection results. We evaluate the performance of ODTP on the widely used nuScenes dataset for autonomous driving.
arXiv Detail & Related papers (2023-08-09T15:46:25Z)
Transforming Model Prediction for Tracking [109.08417327309937]
Transformers capture global relations with little inductive bias, allowing it to learn the prediction of more powerful target models. We train the proposed tracker end-to-end and validate its performance by conducting comprehensive experiments on multiple tracking datasets. Our tracker sets a new state of the art on three benchmarks, achieving an AUC of 68.5% on the challenging LaSOT dataset.
arXiv Detail & Related papers (2022-03-21T17:59:40Z)
TAE: A Semi-supervised Controllable Behavior-aware Trajectory Generator and Predictor [3.6955256596550137]
Trajectory generation and prediction play important roles in planner evaluation and decision making for intelligent vehicles. We propose a behavior-aware Trajectory Autoencoder (TAE) that explicitly models drivers' behavior. Our model addresses trajectory generation and prediction in a unified architecture and benefits both tasks.
arXiv Detail & Related papers (2022-03-02T17:37:44Z)
Trajectory Prediction with Graph-based Dual-scale Context Fusion [43.51107329748957]
We present a graph-based trajectory prediction network named the Dual Scale Predictor. It encodes both the static and dynamical driving context in a hierarchical manner. Thanks to the proposed dual-scale context fusion network, our DSP is able to generate accurate and human-like multi-modal trajectories.
arXiv Detail & Related papers (2021-11-02T13:42:16Z)
DenseTNT: End-to-end Trajectory Prediction from Dense Goal Sets [29.239524389784606]
We propose an anchor-free and end-to-end trajectory prediction model, named DenseTNT, that directly outputs a set of trajectories from dense goal candidates. DenseTNT achieves state-of-the-art performance, ranking 1st on the Argoverse motion forecasting benchmark and being the 1st place winner of the 2021 Open Motion Prediction Challenge.
arXiv Detail & Related papers (2021-08-22T05:27:35Z)
PnPNet: End-to-End Perception and Prediction with Tracking in the Loop [82.97006521937101]
We tackle the problem of joint perception and motion forecasting in the context of self-driving vehicles. We propose Net, an end-to-end model that takes as input sensor data, and outputs at each time step object tracks and their future level.
arXiv Detail & Related papers (2020-05-29T17:57:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.