Mobility-Embedded POIs: Learning What A Place Is and How It Is Used from Human Movement
- URL: http://arxiv.org/abs/2601.21149v1
- Date: Thu, 29 Jan 2026 01:12:35 GMT
- Title: Mobility-Embedded POIs: Learning What A Place Is and How It Is Used from Human Movement
- Authors: Maria Despoina Siampou, Shushman Choudhury, Shang-Ling Hsu, Neha Arora, Cyrus Shahabi,
- Abstract summary: We introduce Mobility-Embedded POIs (ME-POIs), a framework that augments POI embeddings derived from language models with large-scale human mobility data.<n>ME-POIs encodes individual visits as temporally contextualized embeddings and aligns them with learnable POI representations.<n>We evaluate ME-POIs on five newly proposed map enrichment tasks, testing its ability to capture both the identity and function of POIs.
- Score: 8.906820313234476
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent progress in geospatial foundation models highlights the importance of learning general-purpose representations for real-world locations, particularly points-of-interest (POIs) where human activity concentrates. Existing approaches, however, focus primarily on place identity derived from static textual metadata, or learn representations tied to trajectory context, which capture movement regularities rather than how places are actually used (i.e., POI's function). We argue that POI function is a missing but essential signal for general POI representations. We introduce Mobility-Embedded POIs (ME-POIs), a framework that augments POI embeddings derived, from language models with large-scale human mobility data to learn POI-centric, context-independent representations grounded in real-world usage. ME-POIs encodes individual visits as temporally contextualized embeddings and aligns them with learnable POI representations via contrastive learning to capture usage patterns across users and time. To address long-tail sparsity, we propose a novel mechanism that propagates temporal visit patterns from nearby, frequently visited POIs across multiple spatial scales. We evaluate ME-POIs on five newly proposed map enrichment tasks, testing its ability to capture both the identity and function of POIs. Across all tasks, augmenting text-based embeddings with ME-POIs consistently outperforms both text-only and mobility-only baselines. Notably, ME-POIs trained on mobility data alone can surpass text-only models on certain tasks, highlighting that POI function is a critical component of accurate and generalizable POI representations.
Related papers
- Foundation Model for Skeleton-Based Human Action Understanding [56.89025287217221]
This paper presents a Unified Skeleton-based Dense Representation Learning framework.<n>USDRL consists of a Transformer-based Dense Spatio-Temporal (DSTE), Multi-Grained Feature Decorrelation (MG-FD), and Multi-Perspective Consistency Training (MPCT)
arXiv Detail & Related papers (2025-08-18T02:42:16Z) - On The Role of Pretrained Language Models in General-Purpose Text Embeddings: A Survey [39.840208834931076]
General-purpose text embeddings (GPTE) have gained significant traction for their ability to produce rich, transferable representations.<n>We provide a comprehensive overview of GPTE in the era of pretrained language models (PLMs)<n>We describe advanced roles enabled by PLMs, such as multilingual support, multimodal integration, code understanding, and scenario-specific adaptation.
arXiv Detail & Related papers (2025-07-28T12:52:24Z) - POIFormer: A Transformer-Based Framework for Accurate and Scalable Point-of-Interest Attribution [3.729614737011418]
textsfPOIFormer is a novel Transformer-based framework for accurate and efficient POI attribution.<n>textsfPOIFormer enables accurate, efficient attribution in large, noisy mobility datasets.
arXiv Detail & Related papers (2025-07-12T04:37:52Z) - TrajSceneLLM: A Multimodal Perspective on Semantic GPS Trajectory Analysis [0.0]
We propose TrajSceneLLM, a multimodal perspective for enhancing semantic understanding of GPS trajectories.<n>We validate the proposed framework on Travel Mode Identification (TMI), a critical task for analyzing travel choices and understanding mobility behavior.<n>This semantic enhancement promises significant potential for diverse downstream applications and future research in artificial intelligence.
arXiv Detail & Related papers (2025-06-19T15:31:40Z) - LPO: Towards Accurate GUI Agent Interaction via Location Preference Optimization [58.65395773049273]
Location Preference Optimization (LPO) is a novel approach that leverages locational data to optimize interaction preferences.<n>LPO uses information entropy to predict interaction positions by focusing on zones rich in information.<n>Our code will be made publicly available soon, at https://github.com/AIDC-AI/LPO.
arXiv Detail & Related papers (2025-06-11T03:43:30Z) - Geography-Aware Large Language Models for Next POI Recommendation [21.03555605703108]
Next Point-of-Interest (POI) recommendation task aims to predict users' next destinations based on their historical movement data.<n>We propose GA-LLM (Geography-Aware Large Language Model), a novel framework that enhances Large Language Models with two specialized components.<n>Experiments on three real-world datasets demonstrate the state-of-the-art performance of GA-LLM.
arXiv Detail & Related papers (2025-05-18T03:20:20Z) - POI-Enhancer: An LLM-based Semantic Enhancement Framework for POI Representation Learning [34.93661259065691]
Recent studies have shown that enriching POI representations with multimodal information can significantly enhance their task performance.<n>Large language models (LLMs) trained on extensive text data have been found to possess rich textual knowledge.<n>We propose POI-Enhancer, a portable framework that leverages LLMs to improve POI representations produced by classic POI learning models.
arXiv Detail & Related papers (2025-02-14T09:34:24Z) - Flex: End-to-End Text-Instructed Visual Navigation from Foundation Model Features [59.892436892964376]
We investigate the minimal data requirements and architectural adaptations necessary to achieve robust closed-loop performance with vision-based control policies.<n>Our findings are synthesized in Flex (Fly lexically), a framework that uses pre-trained Vision Language Models (VLMs) as frozen patch-wise feature extractors.<n>We demonstrate the effectiveness of this approach on a quadrotor fly-to-target task, where agents trained via behavior cloning successfully generalize to real-world scenes.
arXiv Detail & Related papers (2024-10-16T19:59:31Z) - Learning Where to Look: Self-supervised Viewpoint Selection for Active Localization using Geometrical Information [68.10033984296247]
This paper explores the domain of active localization, emphasizing the importance of viewpoint selection to enhance localization accuracy.
Our contributions involve using a data-driven approach with a simple architecture designed for real-time operation, a self-supervised data training method, and the capability to consistently integrate our map into a planning framework tailored for real-world robotics applications.
arXiv Detail & Related papers (2024-07-22T12:32:09Z) - ALP: Action-Aware Embodied Learning for Perception [60.64801970249279]
We introduce Action-Aware Embodied Learning for Perception (ALP)
ALP incorporates action information into representation learning through a combination of optimizing a reinforcement learning policy and an inverse dynamics prediction objective.
We show that ALP outperforms existing baselines in several downstream perception tasks.
arXiv Detail & Related papers (2023-06-16T21:51:04Z) - Self-supervised Graph-based Point-of-interest Recommendation [66.58064122520747]
Next Point-of-Interest (POI) recommendation has become a prominent component in location-based e-commerce.
We propose a Self-supervised Graph-enhanced POI Recommender (S2GRec) for next POI recommendation.
In particular, we devise a novel Graph-enhanced Self-attentive layer to incorporate the collaborative signals from both global transition graph and local trajectory graphs.
arXiv Detail & Related papers (2022-10-22T17:29:34Z) - Learning to Move with Affordance Maps [57.198806691838364]
The ability to autonomously explore and navigate a physical space is a fundamental requirement for virtually any mobile autonomous agent.
Traditional SLAM-based approaches for exploration and navigation largely focus on leveraging scene geometry.
We show that learned affordance maps can be used to augment traditional approaches for both exploration and navigation, providing significant improvements in performance.
arXiv Detail & Related papers (2020-01-08T04:05:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.