Related papers: Where You Go is Who You Are: Behavioral Theory-Guided LLMs for Inverse Reinforcement Learning

Where You Go is Who You Are: Behavioral Theory-Guided LLMs for Inverse Reinforcement Learning

URL: http://arxiv.org/abs/2505.17249v1
Date: Thu, 22 May 2025 19:56:03 GMT
Title: Where You Go is Who You Are: Behavioral Theory-Guided LLMs for Inverse Reinforcement Learning
Authors: Yuran Sun, Susu Xu, Chenguang Wang, Xilei Zhao,
Abstract summary: This study introduces SILIC, short for Sociodemographic Inference with LLM-guided Inverse Reinforcement Learning (IRL) and Cognitive Chain Reasoning ( CCR)<n> CCR infers sociodemographic attributes from observed mobility patterns by capturing latent behavioral intentions and reasoning through psychological constructs.<n>Our method substantially outperforms state-of-the-art baselines and shows great promise for enriching big trajectory data to support behaviorally grounded applications in transportation planning and beyond.
Score: 4.345382237366071
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Big trajectory data hold great promise for human mobility analysis, but their utility is often constrained by the absence of critical traveler attributes, particularly sociodemographic information. While prior studies have explored predicting such attributes from mobility patterns, they often overlooked underlying cognitive mechanisms and exhibited low predictive accuracy. This study introduces SILIC, short for Sociodemographic Inference with LLM-guided Inverse Reinforcement Learning (IRL) and Cognitive Chain Reasoning (CCR), a theoretically grounded framework that leverages LLMs to infer sociodemographic attributes from observed mobility patterns by capturing latent behavioral intentions and reasoning through psychological constructs. Particularly, our approach explicitly follows the Theory of Planned Behavior (TPB), a foundational behavioral framework in transportation research, to model individuals' latent cognitive processes underlying travel decision-making. The LLMs further provide heuristic guidance to improve IRL reward function initialization and update by addressing its ill-posedness and optimization challenges arising from the vast and unstructured reward space. Evaluated in the 2017 Puget Sound Regional Council Household Travel Survey, our method substantially outperforms state-of-the-art baselines and shows great promise for enriching big trajectory data to support more behaviorally grounded applications in transportation planning and beyond.

Related papers

LLMs are Introvert [21.542534041341774]
Large language models (LLMs) offer new potential for simulating psychological aspects of information spread.<n>Initial experiments revealed significant gaps between LLM-generated behaviors and authentic human dynamics.<n>We propose the Social Information Processing-based Chain of Thought (SIP-CoT) mechanism enhanced by emotion-guided memory.
arXiv Detail & Related papers (2025-07-08T03:32:38Z)
Dynamic Programming Techniques for Enhancing Cognitive Representation in Knowledge Tracing [125.75923987618977]
We propose the Cognitive Representation Dynamic Programming based Knowledge Tracing (CRDP-KT) model.<n>It is a dynamic programming algorithm to optimize cognitive representations based on the difficulty of the questions and the performance intervals between them.<n>It provides more accurate and systematic input features for subsequent model training, thereby minimizing distortion in the simulation of cognitive states.
arXiv Detail & Related papers (2025-06-03T14:44:48Z)
Latent Diffusion Planning for Imitation Learning [78.56207566743154]
Latent Diffusion Planning (LDP) is a modular approach consisting of a planner and inverse dynamics model.<n>By separating planning from action prediction, LDP can benefit from the denser supervision signals of suboptimal and action-free data.<n>On simulated visual robotic manipulation tasks, LDP outperforms state-of-the-art imitation learning approaches.
arXiv Detail & Related papers (2025-04-23T17:53:34Z)
A Survey on Post-training of Large Language Models [185.51013463503946]
Large Language Models (LLMs) have fundamentally transformed natural language processing, making them indispensable across domains ranging from conversational systems to scientific exploration.<n>These challenges necessitate advanced post-training language models (PoLMs) to address shortcomings, such as restricted reasoning capacities, ethical uncertainties, and suboptimal domain-specific performance.<n>This paper presents the first comprehensive survey of PoLMs, systematically tracing their evolution across five core paradigms: Fine-tuning, which enhances task-specific accuracy; Alignment, which ensures ethical coherence and alignment with human preferences; Reasoning, which advances multi-step inference despite challenges in reward design; Integration and Adaptation, which
arXiv Detail & Related papers (2025-03-08T05:41:42Z)
LLM Post-Training: A Deep Dive into Reasoning Large Language Models [131.10969986056]
Large Language Models (LLMs) have transformed the natural language processing landscape and brought to life diverse applications.<n>Post-training methods enable LLMs to refine their knowledge, improve reasoning, enhance factual accuracy, and align more effectively with user intents and ethical considerations.
arXiv Detail & Related papers (2025-02-28T18:59:54Z)
Causality for Large Language Models [37.10970529459278]
Large language models (LLMs) with billions or trillions of parameters are trained on vast datasets, achieving unprecedented success across a series of language tasks. Recent research highlights that LLMs function as causal parrots, capable of reciting causal knowledge without truly understanding or applying it. This survey aims to explore how causality can enhance LLMs at every stage of their lifecycle.
arXiv Detail & Related papers (2024-10-20T07:22:23Z)
Cognitive LLMs: Towards Integrating Cognitive Architectures and Large Language Models for Manufacturing Decision-making [51.737762570776006]
LLM-ACTR is a novel neuro-symbolic architecture that provides human-aligned and versatile decision-making. Our framework extracts and embeds knowledge of ACT-R's internal decision-making process as latent neural representations. Our experiments on novel Design for Manufacturing tasks show both improved task performance as well as improved grounded decision-making capability.
arXiv Detail & Related papers (2024-08-17T11:49:53Z)
Predicting and Understanding Human Action Decisions: Insights from Large Language Models and Cognitive Instance-Based Learning [0.0]
Large Language Models (LLMs) have demonstrated their capabilities across various tasks. This paper exploits the reasoning and generative capabilities of the LLMs to predict human behavior in two sequential decision-making tasks. We compare the performance of LLMs with a cognitive instance-based learning model, which imitates human experiential decision-making.
arXiv Detail & Related papers (2024-07-12T14:13:06Z)
Learning Planning-based Reasoning by Trajectories Collection and Process Reward Synthesizing [61.98556945939045]
We propose a framework to learn planning-based reasoning through Direct Preference Optimization (DPO) on collected trajectories. Our results on challenging logical reasoning benchmarks demonstrate the effectiveness of our learning framework.
arXiv Detail & Related papers (2024-02-01T15:18:33Z)
Large Language Models for Spatial Trajectory Patterns Mining [9.70298494476926]
Large language models (LLMs) have demonstrated their ability to reason in a manner akin to humans. This presents significant potential for analyzing temporal patterns in human mobility. Our work provides insights on the strengths and limitations of LLMs for human spatial trajectory analysis.
arXiv Detail & Related papers (2023-10-07T23:21:29Z)
Predicting Human Mobility via Self-supervised Disentanglement Learning [21.61423193132924]
We propose a novel disentangled solution called SSDL for tackling the next POI prediction problem. We present two realistic trajectory augmentation approaches to enhance the understanding of both the human intrinsic periodicity and constantly-changing intents. Extensive experiments conducted on four real-world datasets demonstrate that our proposed SSDL significantly outperforms the state-of-the-art approaches.
arXiv Detail & Related papers (2022-11-17T16:17:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.