RHYTHM: Reasoning with Hierarchical Temporal Tokenization for Human Mobility
- URL: http://arxiv.org/abs/2509.23115v2
- Date: Mon, 20 Oct 2025 03:10:11 GMT
- Title: RHYTHM: Reasoning with Hierarchical Temporal Tokenization for Human Mobility
- Authors: Haoyu He, Haozheng Luo, Yan Chen, Qi R. Wang,
- Abstract summary: We introduce RHYTHM (Reasoning with Hierarchical Temporal Tokenization for Human Mobility), a unified framework for predicting human mobility.<n>We use large language models (LLMs) as general-purpose predictors and reasoners.<n> RHYTHM achieves a 2.4% in overall accuracy, a 5.0% increase on weekends, and a 24.6% reduction in training time.
- Score: 9.200793414310182
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Predicting human mobility is inherently challenging due to complex long-range dependencies and multi-scale periodic behaviors. To address this, we introduce RHYTHM (Reasoning with Hierarchical Temporal Tokenization for Human Mobility), a unified framework that leverages large language models (LLMs) as general-purpose spatio-temporal predictors and trajectory reasoners. Methodologically, RHYTHM employs temporal tokenization to partition each trajectory into daily segments and encode them as discrete tokens with hierarchical attention that captures both daily and weekly dependencies, thereby quadratically reducing the sequence length while preserving cyclical information. Additionally, we enrich token representations by adding pre-computed prompt embeddings for trajectory segments and prediction targets via a frozen LLM, and feeding these combined embeddings back into the LLM backbone to capture complex interdependencies. Computationally, RHYTHM keeps the pretrained LLM backbone frozen, yielding faster training and lower memory usage. We evaluate our model against state-of-the-art methods using three real-world datasets. Notably, RHYTHM achieves a 2.4% improvement in overall accuracy, a 5.0% increase on weekends, and a 24.6% reduction in training time. Code is publicly available at https://github.com/he-h/rhythm.
Related papers
- Towards Multimodal Lifelong Understanding: A Dataset and Agentic Baseline [58.585692088008905]
MM-Lifelong is a dataset designed for Multimodal Lifelong Understanding.<n>Comprising 181.1 hours of footage, it is structured across Day, Week, and Month scales to capture varying temporal densities.
arXiv Detail & Related papers (2026-03-05T18:52:12Z) - Moving Beyond Functional Connectivity: Time-Series Modeling for fMRI-Based Brain Disorder Classification [8.837732238971187]
Functional magnetic resonance imaging (fMRI) enables non-invasive brain disorder classification by capturing blood-oxygen-level-dependent (BOLD) signals.<n>Most existing methods rely on functional connectivity (FC) via Pearson correlation.<n>We benchmark state-of-the-art temporal models on raw BOLD signals across five public datasets.
arXiv Detail & Related papers (2026-02-09T04:42:42Z) - PPMStereo: Pick-and-Play Memory Construction for Consistent Dynamic Stereo Matching [51.98089287914147]
textbfPick-and-textbflay textbfMemory (PM) construction module for dynamic bfStereo matching, dubbed as bftextPPMStereo.<n>Inspired by the two-stage decision-making process in humans, we propose a textbfPick-and-textbflay textbfMemory (PM) construction module for dynamic bfStereo matching, dubbed as bftextPPMStereo.
arXiv Detail & Related papers (2025-10-23T03:52:39Z) - Hierarchical Self-Supervised Representation Learning for Depression Detection from Speech [51.14752758616364]
Speech-based depression detection (SDD) is a promising, non-invasive alternative to traditional clinical assessments.<n>We propose HAREN-CTC, a novel architecture that integrates multi-layer SSL features using cross-attention within a multitask learning framework.<n>The model achieves state-of-the-art macro F1-scores of 0.81 on DAIC-WOZ and 0.82 on MODMA, outperforming prior methods across both evaluation scenarios.
arXiv Detail & Related papers (2025-10-05T09:32:12Z) - Handoff Design in User-Centric Cell-Free Massive MIMO Networks Using DRL [26.772811966031746]
This paper presents a deep reinforcement learning-based solution to predict and manage connections for mobile users.<n>Our solution employs the Soft Actor-Critic algorithm, with continuous action space representation, to train a deep neural network to serve as the HO policy.<n>We present a novel proposition for a reward function that integrates a HO penalty in order to balance the attainable rate and the associated overhead related to HOs.
arXiv Detail & Related papers (2025-07-28T16:21:45Z) - Efficient Temporal Tokenization for Mobility Prediction with Large Language Models [7.704947355789259]
RHYTHM is a framework that leverages large language models (LLMs) as trajectory predictors and reasoners.<n> Token representations are enriched with prompt embeddings via a frozen LLM, enhancing the model's ability to capture interdependencies.<n> Evaluation on three real-world datasets demonstrates a 2.4% improvement in accuracy, 5.0% increase on weekends, and 24.6% reduction in training time compared to state-of-the-art methods.
arXiv Detail & Related papers (2025-07-18T15:31:16Z) - A theoretical framework for self-supervised contrastive learning for continuous dependent data [79.62732169706054]
Self-supervised learning (SSL) has emerged as a powerful approach to learning representations, particularly in the field of computer vision.<n>We propose a novel theoretical framework for contrastive SSL tailored to emphsemantic independence between samples.<n>Specifically, we outperform TS2Vec on the standard UEA and UCR benchmarks, with accuracy improvements of $4.17$% and $2.08$%, respectively.
arXiv Detail & Related papers (2025-06-11T14:23:47Z) - SuperFlow++: Enhanced Spatiotemporal Consistency for Cross-Modal Data Pretraining [62.433137130087445]
SuperFlow++ is a novel framework that integrates pretraining and downstream tasks using consecutive camera pairs.<n>We show that SuperFlow++ outperforms state-of-the-art methods across diverse tasks and driving conditions.<n>With strong generalizability and computational efficiency, SuperFlow++ establishes a new benchmark for data-efficient LiDAR-based perception in autonomous driving.
arXiv Detail & Related papers (2025-03-25T17:59:57Z) - BRATI: Bidirectional Recurrent Attention for Time-Series Imputation [0.14999444543328289]
Missing data in time-series analysis poses significant challenges, affecting the reliability of downstream applications.<n>This paper introduces BRATI, a novel deep-learning model designed to address multivariate time-series imputation.<n>BRATI processes temporal dependencies and feature correlations across long and short time horizons, utilizing two imputation blocks that operate in opposite temporal directions.
arXiv Detail & Related papers (2025-01-09T17:50:56Z) - RefreshKV: Updating Small KV Cache During Long-form Generation [54.00118604124301]
We propose a new inference method, RefreshKV, that flexibly alternates between full context attention and attention over a subset of input tokens during generation.<n>Applying our method to off-the-shelf LLMs achieves comparable speedup to eviction-based methods while improving performance for various long-form generation tasks.
arXiv Detail & Related papers (2024-11-08T18:57:07Z) - Spatial-Temporal-Decoupled Masked Pre-training for Spatiotemporal Forecasting [15.446085872077898]
We propose a self-supervised pre-training framework that employs two decoupled masked autoencoders to reconstruct totemporal series along the spatial and temporal dimensions.
Rich-context representations learned through such reconstruction could be seamlessly integrated by downstream with predictors arbitrary architectures to augment their performances.
arXiv Detail & Related papers (2023-12-01T11:43:49Z) - GATGPT: A Pre-trained Large Language Model with Graph Attention Network
for Spatiotemporal Imputation [19.371155159744934]
In real-world settings, such data often contain missing elements due to issues like sensor malfunctions and data transmission errors.
The objective oftemporal imputation is to estimate these missing values by understanding the inherent spatial and temporal relationships in the observed time series.
Traditionally, intricatetemporal imputation has relied on specific architectures, which suffer from limited applicability and high computational complexity.
In contrast our approach integrates pre-trained large language models (LLMs) into intricatetemporal imputation, introducing a groundbreaking framework, GATGPT.
arXiv Detail & Related papers (2023-11-24T08:15:11Z) - Large Scale Time-Series Representation Learning via Simultaneous Low and
High Frequency Feature Bootstrapping [7.0064929761691745]
We propose a non-contrastive self-supervised learning approach efficiently captures low and high-frequency time-varying features.
Our method takes raw time series data as input and creates two different augmented views for two branches of the model.
To demonstrate the robustness of our model we performed extensive experiments and ablation studies on five real-world time-series datasets.
arXiv Detail & Related papers (2022-04-24T14:39:47Z) - Temporal Memory Relation Network for Workflow Recognition from Surgical
Video [53.20825496640025]
We propose a novel end-to-end temporal memory relation network (TMNet) for relating long-range and multi-scale temporal patterns.
We have extensively validated our approach on two benchmark surgical video datasets.
arXiv Detail & Related papers (2021-03-30T13:20:26Z) - Convolutional Tensor-Train LSTM for Spatio-temporal Learning [116.24172387469994]
We propose a higher-order LSTM model that can efficiently learn long-term correlations in the video sequence.
This is accomplished through a novel tensor train module that performs prediction by combining convolutional features across time.
Our results achieve state-of-the-art performance-art in a wide range of applications and datasets.
arXiv Detail & Related papers (2020-02-21T05:00:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.