Related papers: MoveFM-R: Advancing Mobility Foundation Models via Language-driven Semantic Reasoning

MoveFM-R: Advancing Mobility Foundation Models via Language-driven Semantic Reasoning

URL: http://arxiv.org/abs/2509.22403v1
Date: Fri, 26 Sep 2025 14:31:57 GMT
Title: MoveFM-R: Advancing Mobility Foundation Models via Language-driven Semantic Reasoning
Authors: Fanjin Meng, Yuan Yuan, Jingtao Ding, Jie Feng, Chonghua Han, Yong Li,
Abstract summary: Mobility Foundation Models (MFMs) have advanced the modeling of human movement patterns, yet they face a ceiling due to limitations in data scale and semantic understanding.<n>We propose MoveFM-R, a novel framework that unlocks the full potential of mobility foundation models by leveraging language-driven semantic reasoning capabilities.<n>MoveFM-R is built on three core innovations: a semantically enhanced location encoding to bridge the geography-language gap, a progressive curriculum, and an interactive self-reflection mechanism.
Score: 17.430772832222793
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Mobility Foundation Models (MFMs) have advanced the modeling of human movement patterns, yet they face a ceiling due to limitations in data scale and semantic understanding. While Large Language Models (LLMs) offer powerful semantic reasoning, they lack the innate understanding of spatio-temporal statistics required for generating physically plausible mobility trajectories. To address these gaps, we propose MoveFM-R, a novel framework that unlocks the full potential of mobility foundation models by leveraging language-driven semantic reasoning capabilities. It tackles two key challenges: the vocabulary mismatch between continuous geographic coordinates and discrete language tokens, and the representation gap between the latent vectors of MFMs and the semantic world of LLMs. MoveFM-R is built on three core innovations: a semantically enhanced location encoding to bridge the geography-language gap, a progressive curriculum to align the LLM's reasoning with mobility patterns, and an interactive self-reflection mechanism for conditional trajectory generation. Extensive experiments demonstrate that MoveFM-R significantly outperforms existing MFM-based and LLM-based baselines. It also shows robust generalization in zero-shot settings and excels at generating realistic trajectories from natural language instructions. By synthesizing the statistical power of MFMs with the deep semantic understanding of LLMs, MoveFM-R pioneers a new paradigm that enables a more comprehensive, interpretable, and powerful modeling of human mobility. The implementation of MoveFM-R is available online at https://anonymous.4open.science/r/MoveFM-R-CDE7/.

Related papers

LLaMo: Scaling Pretrained Language Models for Unified Motion Understanding and Generation with Continuous Autoregressive Tokens [19.167250154665812]
We propose LLaMo, a framework that extends pretrained large language models through a modality-specific Mixture-of-Transformers architecture.<n>We encode human motion into a causal continuous latent space and maintain the next-token prediction paradigm in the decoder-only backbone.<n>Our experiments demonstrate that LLaMo achieves high-fidelity text-to-motion generation and motion-to-text captioning in general settings.
arXiv Detail & Related papers (2026-02-12T20:02:21Z)
Codified Finite-state Machines for Role-playing [70.86310301713068]
We introduce Codified Finite-State Machines (CFSMs), a framework that automatically codifies textual character profiles into FSMs.<n>CFSMs extract key states and transitions directly from the profile, producing interpretable structures that enforce character consistency.<n>We extend CFSMs into Codified Probabilistic Finite-State Machines (CPFSMs), where transitions are modeled as probability distributions over states.
arXiv Detail & Related papers (2026-02-05T17:19:18Z)
Enhancing Large Language Models for Mobility Analytics with Semantic Location Tokenization [29.17336622418242]
We propose QT-Mob, a novel framework that significantly enhances Large Language Models (LLMs) for mobility analytics.<n> QT-Mob introduces a location tokenization module that learns compact, semantically rich tokens to represent locations.<n>Experiments on three real-world dataset demonstrate the superior performance in both next-location prediction and mobility recovery tasks.
arXiv Detail & Related papers (2025-06-08T02:17:50Z)
Incentivizing Multimodal Reasoning in Large Models for Direct Robot Manipulation [89.5123417007126]
We show how to make Large Multimodal Models (LMMs) understand the spatial action space.<n>We also show how to fully exploit the reasoning capacity of LMMs in solving these tasks.<n>Our resulting reasoning model built upon a 7B backbone, named ReasonManip, demonstrates three notable advantages.
arXiv Detail & Related papers (2025-05-19T06:00:14Z)
MELLM: Exploring LLM-Powered Micro-Expression Understanding Enhanced by Subtle Motion Perception [47.80768014770871]
We propose a novel Micro-Expression Large Language Model (MELLM)<n>It incorporates a subtle facial motion perception strategy with the strong inference capabilities of MLLMs.<n>Our model exhibits superior robustness and generalization capabilities in micro-expression understanding (MEU)
arXiv Detail & Related papers (2025-05-11T15:08:23Z)
Modular Machine Learning: An Indispensable Path towards New-Generation Large Language Models [33.822930522694406]
We overview a promising learning paradigm, i.e., Modular Machine Learning (MML), as an essential approach toward new-generation large language models (LLMs)<n>We propose a unified MML framework for LLMs, which decomposes the complex structure of LLMs into three interdependent components: modular representation, modular model, and modular reasoning.<n>Ultimately, we believe the integration of the MML with LLMs has the potential to bridge the gap between statistical (deep) learning and formal (logical) reasoning.
arXiv Detail & Related papers (2025-04-28T17:42:02Z)
Navigating Motion Agents in Dynamic and Cluttered Environments through LLM Reasoning [69.5875073447454]
This paper advances motion agents empowered by large language models (LLMs) toward autonomous navigation in dynamic and cluttered environments.<n>Our training-free framework supports multi-agent coordination, closed-loop replanning, and dynamic obstacle avoidance without retraining or fine-tuning.
arXiv Detail & Related papers (2025-03-10T13:39:09Z)
TrajLLM: A Modular LLM-Enhanced Agent-Based Framework for Realistic Human Trajectory Simulation [3.8106509573548286]
This work leverages Large Language Models (LLMs) to simulate human mobility, addressing challenges like high costs and privacy concerns in traditional models.<n>Our hierarchical framework integrates persona generation, activity selection, and destination prediction, using real-world demographic and psychological data.
arXiv Detail & Related papers (2025-02-26T00:13:26Z)
MoFM: A Large-Scale Human Motion Foundation Model [2.621434923709917]
Foundation Models (FMs) have increasingly drawn the attention of researchers due to their scalability and generalization across diverse tasks.<n>MoFM is designed for the semantic understanding of complex human motions in both time and space.<n>MoFM provides a backbone to diverse downstream tasks, supporting paradigms such as one-shot, unsupervised, and supervised tasks.
arXiv Detail & Related papers (2025-02-08T03:42:52Z)
Analyzing Finetuning Representation Shift for Multimodal LLMs Steering [56.710375516257876]
We propose to map hidden states to interpretable visual and textual concepts.<n>This enables us to more efficiently compare certain semantic dynamics, such as the shift from an original and fine-tuned model.<n>We also demonstrate the use of shift vectors to capture these concepts changes.
arXiv Detail & Related papers (2025-01-06T13:37:13Z)
Self-Powered LLM Modality Expansion for Large Speech-Text Models [62.27700381806554]
Large language models (LLMs) exhibit remarkable performance across diverse tasks. This study aims to refine the use of speech datasets for LSM training by addressing the limitations of vanilla instruction tuning. We introduce a self-powered LSM that leverages augmented automatic speech recognition data generated by the model itself for more effective instruction tuning.
arXiv Detail & Related papers (2024-10-04T04:34:24Z)
LIMP: Large Language Model Enhanced Intent-aware Mobility Prediction [5.7042182940772275]
We propose a novel LIMP (LLMs for Intent-ware Mobility Prediction) framework. Specifically, LIMP introduces an "Analyze-Abstract-Infer" (A2I) agentic workflow to unleash LLMs commonsense reasoning power for mobility intention inference. We evaluate LIMP on two real-world datasets, demonstrating improved accuracy in next-location prediction and effective intention inference.
arXiv Detail & Related papers (2024-08-23T04:28:56Z)
Motion-Agent: A Conversational Framework for Human Motion Generation with LLMs [67.59291068131438]
Motion-Agent is a conversational framework designed for general human motion generation, editing, and understanding. Motion-Agent employs an open-source pre-trained language model to develop a generative agent, MotionLLM, that bridges the gap between motion and text.
arXiv Detail & Related papers (2024-05-27T09:57:51Z)
Chain-of-Planned-Behaviour Workflow Elicits Few-Shot Mobility Generation in LLMs [20.70758465552438]
Chain-of-Planned Behaviour significantly reduces the error rate of mobility intention generation from 57.8% to 19.4%. We find mechanistic mobility models, such as gravity model, can effectively map mobility intentions to physical mobility. The proposed CoPB workflow can facilitate GPT-4-turbo to automatically generate high quality labels for mobility behaviour reasoning.
arXiv Detail & Related papers (2024-02-15T09:58:23Z)
Let Models Speak Ciphers: Multiagent Debate through Embeddings [84.20336971784495]
We introduce CIPHER (Communicative Inter-Model Protocol Through Embedding Representation) to address this issue. By deviating from natural language, CIPHER offers an advantage of encoding a broader spectrum of information without any modification to the model weights. This showcases the superiority and robustness of embeddings as an alternative "language" for communication among LLMs.
arXiv Detail & Related papers (2023-10-10T03:06:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.