Related papers: iMotion-LLM: Motion Prediction Instruction Tuning

iMotion-LLM: Motion Prediction Instruction Tuning

URL: http://arxiv.org/abs/2406.06211v2
Date: Tue, 11 Jun 2024 12:37:23 GMT
Title: iMotion-LLM: Motion Prediction Instruction Tuning
Authors: Abdulwahab Felemban, Eslam Mohamed Bakr, Xiaoqian Shen, Jian Ding, Abduallah Mohamed, Mohamed Elhoseiny,
Abstract summary: We introduce iMotion-LLM: a Multimodal Large Language Models with trajectory prediction, tailored to guide interactive multi-agent scenarios. iMotion-LLM capitalizes on textual instructions as key inputs for generating contextually relevant trajectories. These findings act as milestones in empowering autonomous navigation systems to interpret and predict the dynamics of multi-agent environments.
Score: 33.63656257401926
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: We introduce iMotion-LLM: a Multimodal Large Language Models (LLMs) with trajectory prediction, tailored to guide interactive multi-agent scenarios. Different from conventional motion prediction approaches, iMotion-LLM capitalizes on textual instructions as key inputs for generating contextually relevant trajectories. By enriching the real-world driving scenarios in the Waymo Open Dataset with textual motion instructions, we created InstructWaymo. Leveraging this dataset, iMotion-LLM integrates a pretrained LLM, fine-tuned with LoRA, to translate scene features into the LLM input space. iMotion-LLM offers significant advantages over conventional motion prediction models. First, it can generate trajectories that align with the provided instructions if it is a feasible direction. Second, when given an infeasible direction, it can reject the instruction, thereby enhancing safety. These findings act as milestones in empowering autonomous navigation systems to interpret and predict the dynamics of multi-agent environments, laying the groundwork for future advancements in this field.

Related papers

DeMo++: Motion Decoupling for Autonomous Driving [41.6423398623095]
We propose DeMo++, a framework that decouples motion intentions into two distinct components.<n>We introduce a cross-scene trajectory interaction mechanism to explore the relationships between motions in adjacent scenes.<n>DeMo++ achieves state-of-the-art performance across various benchmarks, including motion forecasting (Argoverse 2 and nuScenes), motion planning (nuPlan), and end-to-end planning (SIM)
arXiv Detail & Related papers (2025-07-23T09:11:25Z)
LANGTRAJ: Diffusion Model and Dataset for Language-Conditioned Trajectory Simulation [94.84458417662404]
LangTraj is a language-conditioned scene-diffusion model that simulates the joint behavior of all agents in traffic scenarios.<n>By conditioning on natural language inputs, LangTraj provides flexible and intuitive control over interactive behaviors.<n>LangTraj demonstrates strong performance in realism, language controllability, and language-conditioned safety-critical simulation.
arXiv Detail & Related papers (2025-04-15T17:14:06Z)
Dynamic Path Navigation for Motion Agents with LLM Reasoning [69.5875073447454]
Large Language Models (LLMs) have demonstrated strong generalizable reasoning and planning capabilities. We explore the zero-shot navigation and path generation capabilities of LLMs by constructing a dataset and proposing an evaluation protocol. We demonstrate that, when tasks are well-structured in this manner, modern LLMs exhibit substantial planning proficiency in avoiding obstacles while autonomously refining navigation with the generated motion to reach the target.
arXiv Detail & Related papers (2025-03-10T13:39:09Z)
Human Motion Instruction Tuning [30.71209562108675]
This paper presents LLaMo, a framework for human motion instruction tuning. LLaMo retains motion in its native form for instruction tuning. By processing both video and motion data alongside textual inputs, LLaMo enables a flexible, human-centric analysis.
arXiv Detail & Related papers (2024-11-25T14:38:43Z)
E-Motion: Future Motion Simulation via Event Sequence Diffusion [86.80533612211502]
Event-based sensors may potentially offer a unique opportunity to predict future motion with a level of detail and precision previously unachievable. We propose to integrate the strong learning capacity of the video diffusion model with the rich motion information of an event camera as a motion simulation framework. Our findings suggest a promising direction for future research in enhancing the interpretative power and predictive accuracy of computer vision systems.
arXiv Detail & Related papers (2024-10-11T09:19:23Z)
Traj-LLM: A New Exploration for Empowering Trajectory Prediction with Pre-trained Large Language Models [12.687494201105066]
This paper proposes Traj-LLM, the first to investigate the potential of using Large Language Models (LLMs) to generate future motion from agents' past/observed trajectories and scene semantics. LLMs' powerful comprehension abilities capture a spectrum of high-level scene knowledge and interactive information. Emulating the human-like lane focus cognitive function, we introduce lane-aware probabilistic learning powered by the pioneering Mamba module.
arXiv Detail & Related papers (2024-05-08T09:28:04Z)
AMP: Autoregressive Motion Prediction Revisited with Next Token Prediction for Autonomous Driving [59.94343412438211]
We introduce the GPT style next token motion prediction into motion prediction. Different from language data which is composed of homogeneous units -words, the elements in the driving scene could have complex spatial-temporal and semantic relations. We propose to adopt three factorized attention modules with different neighbors for information aggregation and different position encoding styles to capture their relations.
arXiv Detail & Related papers (2024-03-20T06:22:37Z)
Large Language Models Powered Context-aware Motion Prediction in Autonomous Driving [13.879945446114956]
We utilize Large Language Models (LLMs) to enhance the global traffic context understanding for motion prediction tasks. Considering the cost associated with LLMs, we propose a cost-effective deployment strategy. Our research offers valuable insights into enhancing the understanding of traffic scenes of LLMs and the motion prediction performance of autonomous driving.
arXiv Detail & Related papers (2024-03-17T02:06:49Z)
DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving [69.82743399946371]
DriveMLM is a framework that can perform close-loop autonomous driving in realistic simulators. We employ a multi-modal LLM (MLLM) to model the behavior planning module of a module AD system. This model can plug-and-play in existing AD systems such as Apollo for close-loop driving.
arXiv Detail & Related papers (2023-12-14T18:59:05Z)
MotionLM: Multi-Agent Motion Forecasting as Language Modeling [15.317827804763699]
We present MotionLM, a language model for multi-agent motion prediction. Our approach bypasses post-hoc interactions where individual agent trajectory generation is conducted prior to interactive scoring. The model's sequential factorization enables temporally causal conditional rollouts.
arXiv Detail & Related papers (2023-09-28T15:46:25Z)
MTR++: Multi-Agent Motion Prediction with Symmetric Scene Modeling and Guided Intention Querying [110.83590008788745]
Motion prediction is crucial for autonomous driving systems to understand complex driving scenarios and make informed decisions. In this paper, we propose Motion TRansformer (MTR) frameworks to address these challenges. The initial MTR framework utilizes a transformer encoder-decoder structure with learnable intention queries. We introduce an advanced MTR++ framework, extending the capability of MTR to simultaneously predict multimodal motion for multiple agents.
arXiv Detail & Related papers (2023-06-30T16:23:04Z)
MotionTrack: Learning Motion Predictor for Multiple Object Tracking [68.68339102749358]
We introduce a novel motion-based tracker, MotionTrack, centered around a learnable motion predictor. Our experimental results demonstrate that MotionTrack yields state-of-the-art performance on datasets such as Dancetrack and SportsMOT.
arXiv Detail & Related papers (2023-06-05T04:24:11Z)
Motion Transformer with Global Intention Localization and Local Movement Refinement [103.75625476231401]
Motion TRansformer (MTR) models motion prediction as the joint optimization of global intention localization and local movement refinement. MTR achieves state-of-the-art performance on both the marginal and joint motion prediction challenges.
arXiv Detail & Related papers (2022-09-27T16:23:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.