Related papers: TRACE: A Self-Improving Framework for Robot Behavior Forecasting with Vision-Language Models

TRACE: A Self-Improving Framework for Robot Behavior Forecasting with Vision-Language Models

URL: http://arxiv.org/abs/2503.00761v1
Date: Sun, 02 Mar 2025 06:58:02 GMT
Title: TRACE: A Self-Improving Framework for Robot Behavior Forecasting with Vision-Language Models
Authors: Gokul Puthumanaillam, Paulo Padrao, Jose Fuentes, Pranay Thangeda, William E. Schafer, Jae Hyuk Song, Karan Jagdale, Leonardo Bobadilla, Melkior Ornik,
Abstract summary: Predicting the near-term behavior of a reactive agent is crucial in many robotic scenarios.<n>We present TRACE, an inference framework that couples tree-of-thought generation with domain-aware feedback.<n>We validate TRACE on both ground-vehicle simulations and real-world marine autonomous surface vehicles.
Score: 1.3408365072149797
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Predicting the near-term behavior of a reactive agent is crucial in many robotic scenarios, yet remains challenging when observations of that agent are sparse or intermittent. Vision-Language Models (VLMs) offer a promising avenue by integrating textual domain knowledge with visual cues, but their one-shot predictions often miss important edge cases and unusual maneuvers. Our key insight is that iterative, counterfactual exploration--where a dedicated module probes each proposed behavior hypothesis, explicitly represented as a plausible trajectory, for overlooked possibilities--can significantly enhance VLM-based behavioral forecasting. We present TRACE (Tree-of-thought Reasoning And Counterfactual Exploration), an inference framework that couples tree-of-thought generation with domain-aware feedback to refine behavior hypotheses over multiple rounds. Concretely, a VLM first proposes candidate trajectories for the agent; a counterfactual critic then suggests edge-case variations consistent with partial observations, prompting the VLM to expand or adjust its hypotheses in the next iteration. This creates a self-improving cycle where the VLM progressively internalizes edge cases from previous rounds, systematically uncovering not only typical behaviors but also rare or borderline maneuvers, ultimately yielding more robust trajectory predictions from minimal sensor data. We validate TRACE on both ground-vehicle simulations and real-world marine autonomous surface vehicles. Experimental results show that our method consistently outperforms standard VLM-driven and purely model-based baselines, capturing a broader range of feasible agent behaviors despite sparse sensing. Evaluation videos and code are available at trace-robotics.github.io.

Related papers

Foresight in Motion: Reinforcing Trajectory Prediction with Reward Heuristics [34.570579623171476]
"First Reasoning, Then Forecasting" is a strategy that explicitly incorporates behavior intentions as spatial guidance for trajectory prediction.<n>We introduce an interpretable, reward-driven intention reasoner grounded in a novel query-centric Inverse Reinforcement Learning scheme.<n>Our approach significantly enhances trajectory prediction confidence, achieving highly competitive performance relative to state-of-the-art methods.
arXiv Detail & Related papers (2025-07-16T09:46:17Z)
Traj-Explainer: An Explainable and Robust Multi-modal Trajectory Prediction Approach [12.60529039445456]
Navigating complex traffic environments has been significantly enhanced by advancements in intelligent technologies, enabling accurate environment perception and trajectory prediction for automated vehicles. Existing research often neglects the consideration of the joint reasoning of scenario agents and lacks interpretability in trajectory prediction models. An explainability-oriented trajectory prediction model is designed in this work, named Explainable Diffusion Conditional based Multimodal Trajectory Prediction Traj-Explainer.
arXiv Detail & Related papers (2024-10-22T08:17:33Z)
GraphSCENE: On-Demand Critical Scenario Generation for Autonomous Vehicles in Simulation [7.542220697870245]
This work introduces a novel method that generates dynamic temporal scene graphs corresponding to diverse traffic scenarios, on-demand, tailored to user-defined preferences. A temporal Graph Neural Network (GNN) model learns to predict relationships between ego-vehicle agents and static structures, guided by real-world interaction patterns. We render the predicted scenarios in simulation to further demonstrate their effectiveness as testing environments for AV agents.
arXiv Detail & Related papers (2024-10-17T13:02:06Z)
SAFE-SIM: Safety-Critical Closed-Loop Traffic Simulation with Diffusion-Controllable Adversaries [94.84458417662407]
We introduce SAFE-SIM, a controllable closed-loop safety-critical simulation framework. Our approach yields two distinct advantages: 1) generating realistic long-tail safety-critical scenarios that closely reflect real-world conditions, and 2) providing controllable adversarial behavior for more comprehensive and interactive evaluations. We validate our framework empirically using the nuScenes and nuPlan datasets across multiple planners, demonstrating improvements in both realism and controllability.
arXiv Detail & Related papers (2023-12-31T04:14:43Z)
Interpretable Long Term Waypoint-Based Trajectory Prediction Model [1.4778851751964937]
We study the impact of adding a long-term goal on the performance of a trajectory prediction framework. We present an interpretable long term waypoint-driven prediction framework (WayDCM)
arXiv Detail & Related papers (2023-12-11T09:10:22Z)
JRDB-Traj: A Dataset and Benchmark for Trajectory Forecasting in Crowds [79.00975648564483]
Trajectory forecasting models, employed in fields such as robotics, autonomous vehicles, and navigation, face challenges in real-world scenarios. This dataset provides comprehensive data, including the locations of all agents, scene images, and point clouds, all from the robot's perspective. The objective is to predict the future positions of agents relative to the robot using raw sensory input data.
arXiv Detail & Related papers (2023-11-05T18:59:31Z)
Interpretable Self-Aware Neural Networks for Robust Trajectory Prediction [50.79827516897913]
We introduce an interpretable paradigm for trajectory prediction that distributes the uncertainty among semantic concepts. We validate our approach on real-world autonomous driving data, demonstrating superior performance over state-of-the-art baselines.
arXiv Detail & Related papers (2022-11-16T06:28:20Z)
Control-Aware Prediction Objectives for Autonomous Driving [78.19515972466063]
We present control-aware prediction objectives (CAPOs) to evaluate the downstream effect of predictions on control without requiring the planner be differentiable. We propose two types of importance weights that weight the predictive likelihood: one using an attention model between agents, and another based on control variation when exchanging predicted trajectories for ground truth trajectories.
arXiv Detail & Related papers (2022-04-28T07:37:21Z)
Exploring Social Posterior Collapse in Variational Autoencoder for Interaction Modeling [26.01824780050843]
Variational Autoencoder (VAE) has been widely applied in multi-agent interaction modeling. VAE is prone to ignoring historical social context when predicting the future trajectory of an agent. We propose a novel sparse graph attention message-passing layer, which helps us detect social posterior collapse.
arXiv Detail & Related papers (2021-12-01T06:20:58Z)
You Mostly Walk Alone: Analyzing Feature Attribution in Trajectory Prediction [52.442129609979794]
Recent deep learning approaches for trajectory prediction show promising performance. It remains unclear which features such black-box models actually learn to use for making predictions. This paper proposes a procedure that quantifies the contributions of different cues to model performance.
arXiv Detail & Related papers (2021-10-11T14:24:15Z)
Spatio-Temporal Graph Dual-Attention Network for Multi-Agent Prediction and Tracking [23.608125748229174]
We propose a generic generative neural system for multi-agent trajectory prediction involving heterogeneous agents. The proposed system is evaluated on three public benchmark datasets for trajectory prediction.
arXiv Detail & Related papers (2021-02-18T02:25:35Z)
SMART: Simultaneous Multi-Agent Recurrent Trajectory Prediction [72.37440317774556]
We propose advances that address two key challenges in future trajectory prediction. multimodality in both training data and predictions and constant time inference regardless of number of agents.
arXiv Detail & Related papers (2020-07-26T08:17:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.