FuguReport

Know You Before You Speak: User-State Modeling for LLM Personalization in Multi-Turn Conversation

Authors Jiani Luo, Xiaoyan Zhao, Yang Zhang, Shuyi Miao, Bingbing Xu, Stefan Konigorski, Tat-Seng Chua
Affiliations Chinese Academy of Sciences / National University of Singapore / Beihang University / German Institute of Human Nutrition
Categories Method / User Modeling / User state model for personalization, Application / Dialogue Systems / Personalized multi-turn conversation, Evaluation / Response Quality Evaluation / Dialogue response quality improvement
License CC BY 4.0

Abstract Overview

The paper frames personalization in multi-turn dialogue as a partially observable decision-making problem, arguing that systems should infer hidden user states rather than only reuse explicit user history. It proposes PUMA, a framework that maintains a belief over latent user state, learns an action-conditioned world model for user-state transitions and observation generation, and selects dialogue actions by minimizing expected free energy. The method also separates latent user-state tracking from semantic memory retrieval, using memory for content grounding while reserving state modeling for planning and control. Experiments are conducted in healthcare-oriented counseling and motivational interviewing settings, including simulator-based dynamic evaluation on CAMI and a cross-dataset generalization study on AnnoMI.

Novelty

The distinctive contribution is to make evolving latent user state, rather than explicit memory or static persona, the central object of personalization. PUMA applies the Free Energy Principle to dialogue by jointly performing belief updating, world-model refinement, and expected-free-energy-based action selection over action-conditioned user-state dynamics.

Results

On CAMI, PUMA achieves the strongest dynamic counseling performance among the reported automated methods under both Qwen3-8B and Llama-3.1-8B backbones, for example reaching Lift/Prep/TrigCov/Turns of 1.62/75.9%/62.4%/12.2 with Qwen3-8B and 1.76/83.0%/63.2%/10.7 with Llama-3.1-8B. It also receives the highest reported MITI average among automated methods (4.37 with Qwen3-8B and 4.27 with Llama-3.1-8B), while ablations show that belief tracking, world modeling, and expected-free-energy planning each contribute to performance. In state modeling, PUMA improves current- and next-state accuracy over baselines on CAMI (0.689 and 0.717) and also outperforms a long-prompt baseline in cross-dataset evaluation on AnnoMI (0.639 current-state accuracy and 0.532 next-state accuracy with Qwen3-8B).

Key Points

  1. PUMA models personalized dialogue with an explicit latent user-state belief and an action-conditioned world model, instead of relying only on profile or memory retrieval.
  2. The framework uses expected free energy to choose actions that balance uncertainty reduction about the user with goal-directed dialogue outcomes.
  3. Empirically, PUMA improves long-horizon counseling effectiveness, maintains strong judged response quality, and yields better user-state estimation and transition prediction than the reported baselines.

References

This page was created using generative AI such as GPT-5, Claude Opus 4, Gemini 3, Gemini 3.1 Flash Image, and their higher-end successor versions. No guarantee can be made regarding its contents.