Related papers: Evaluating Few-Shot Temporal Reasoning of LLMs for Human Activity Prediction in Smart Environments

Evaluating Few-Shot Temporal Reasoning of LLMs for Human Activity Prediction in Smart Environments

URL: http://arxiv.org/abs/2602.11176v1
Date: Tue, 20 Jan 2026 20:58:17 GMT
Title: Evaluating Few-Shot Temporal Reasoning of LLMs for Human Activity Prediction in Smart Environments
Authors: Maral Doctorarastoo, Katherine A. Flanigan, Mario Bergés, Christopher McComb,
Abstract summary: Existing data-driven agent-based models struggle in low-data environments.<n>This paper investigates whether large language models, pre-trained on broad human knowledge, can fill this gap.
Score: 1.411614392022118
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Anticipating human activities and their durations is essential in applications such as smart-home automation, simulation-based architectural and urban design, activity-based transportation system simulation, and human-robot collaboration, where adaptive systems must respond to human activities. Existing data-driven agent-based models--from rule-based to deep learning--struggle in low-data environments, limiting their practicality. This paper investigates whether large language models, pre-trained on broad human knowledge, can fill this gap by reasoning about everyday activities from compact contextual cues. We adopt a retrieval-augmented prompting strategy that integrates four sources of context--temporal, spatial, behavioral history, and persona--and evaluate it on the CASAS Aruba smart-home dataset. The evaluation spans two complementary tasks: next-activity prediction with duration estimation, and multi-step daily sequence generation, each tested with various numbers of few-shot examples provided in the prompt. Analyzing few-shot effects reveals how much contextual supervision is sufficient to balance data efficiency and predictive accuracy, particularly in low-data environments. Results show that large language models exhibit strong inherent temporal understanding of human behavior: even in zero-shot settings, they produce coherent daily activity predictions, while adding one or two demonstrations further refines duration calibration and categorical accuracy. Beyond a few examples, performance saturates, indicating diminishing returns. Sequence-level evaluation confirms consistent temporal alignment across few-shot conditions. These findings suggest that pre-trained language models can serve as promising temporal reasoners, capturing both recurring routines and context-dependent behavioral variations, thereby strengthening the behavioral modules of agent-based models.

Related papers

HumanLLM: Towards Personalized Understanding and Simulation of Human Nature [72.55730315685837]
HumanLLM is a foundation model designed for personalized understanding and simulation of individuals.<n>We first construct the Cognitive Genome, a large-scale corpus curated from real-world user data on platforms like Reddit, Twitter, Blogger, and Amazon.<n>We then formulate diverse learning tasks and perform supervised fine-tuning to empower the model to predict a wide range of individualized human behaviors, thoughts, and experiences.
arXiv Detail & Related papers (2026-01-22T09:27:27Z)
Scriboora: Rethinking Human Pose Forecasting [44.79834103607383]
This paper evaluates a wide range of pose forecasting algorithms in the task of absolute pose forecasting.<n>Recent speech models can be efficiently adapted to the task of pose forecasting, and improve current state-of-the-art performance.
arXiv Detail & Related papers (2025-11-19T15:58:33Z)
Training Machine Learning Models on Human Spatio-temporal Mobility Data: An Experimental Study [Experiment Paper] [0.5382679710017696]
Individual-level human mobility prediction has emerged as a significant topic of research with applications in infectious disease monitoring, child, and elderly care.<n>We focus on an underexplored problem in human mobility prediction: determining the best practices to train a machine learning model.<n>We show that explicitly including semantic information can help the model better understand individual patterns of life and improve predictions.
arXiv Detail & Related papers (2025-08-18T17:49:10Z)
When Does Multimodality Lead to Better Time Series Forecasting? [96.26052272121615]
We investigate whether and under what conditions such multimodal integration consistently yields gains.<n>Our findings reveal that the benefits of multimodality are highly condition-dependent.<n>Our study offers a rigorous, quantitative foundation for understanding when multimodality can be expected to aid forecasting tasks.
arXiv Detail & Related papers (2025-06-20T23:55:56Z)
If an LLM Were a Character, Would It Know Its Own Story? Evaluating Lifelong Learning in LLMs [55.8331366739144]
We introduce LIFESTATE-BENCH, a benchmark designed to assess lifelong learning in large language models (LLMs)<n>Our fact checking evaluation probes models' self-awareness, episodic memory retrieval, and relationship tracking, across both parametric and non-parametric approaches.
arXiv Detail & Related papers (2025-03-30T16:50:57Z)
Inverse Dynamics Pretraining Learns Good Representations for Multitask Imitation [66.86987509942607]
We evaluate how such a paradigm should be done in imitation learning. We consider a setting where the pretraining corpus consists of multitask demonstrations. We argue that inverse dynamics modeling is well-suited to this setting.
arXiv Detail & Related papers (2023-05-26T14:40:46Z)
Multi-Timescale Modeling of Human Behavior [0.18199355648379031]
We propose an LSTM network architecture that processes behavioral information at multiple timescales to predict future behavior. We evaluate our architecture on data collected in an urban search and rescue scenario simulated in a virtual Minecraft-based testbed.
arXiv Detail & Related papers (2022-11-16T15:58:57Z)
Time Will Change Things: An Empirical Study on Dynamic Language Understanding in Social Media Classification [5.075802830306718]
We empirically study social media NLU in a dynamic setup, where models are trained on the past data and test on the future. We show that auto-encoding and pseudo-labeling collaboratively show the best robustness in dynamicity.
arXiv Detail & Related papers (2022-10-06T12:18:28Z)
A Generative Language Model for Few-shot Aspect-Based Sentiment Analysis [90.24921443175514]
We focus on aspect-based sentiment analysis, which involves extracting aspect term, category, and predicting their corresponding polarities. We propose to reformulate the extraction and prediction tasks into the sequence generation task, using a generative language model with unidirectional attention. Our approach outperforms the previous state-of-the-art (based on BERT) on average performance by a large margins in few-shot and full-shot settings.
arXiv Detail & Related papers (2022-04-11T18:31:53Z)
Episodic Memory for Learning Subjective-Timescale Models [1.933681537640272]
In model-based learning, an agent's model is commonly defined over transitions between consecutive states of an environment. In contrast, intelligent behaviour in biological organisms is characterised by the ability to plan over varying temporal scales depending on the context. We devise a novel approach to learning a transition dynamics model, based on the sequences of episodic memories that define the agent's subjective timescale.
arXiv Detail & Related papers (2020-10-03T21:55:40Z)
Human Trajectory Forecasting in Crowds: A Deep Learning Perspective [89.4600982169]
We present an in-depth analysis of existing deep learning-based methods for modelling social interactions. We propose two knowledge-based data-driven methods to effectively capture these social interactions. We develop a large scale interaction-centric benchmark TrajNet++, a significant yet missing component in the field of human trajectory forecasting.
arXiv Detail & Related papers (2020-07-07T17:19:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.