FuguReport

PASK: Toward Intent-Aware Proactive Agents with Long-Term Memory

Authors Zhifei Xie, Zongzheng Hu, Fangda Ye, Xin Zhang, Haobo Chai, Zihang Liu, Pengcheng Wu, Guibin Zhang, Yue Liao, Xiaobin Hu, Deheng Ye, Chunyan Miao, Shuicheng Yan
Affiliations Nanyang Technological University / Pask-Core / National University of Singapore
Categories Method / Agent Design / IntentFlow streaming model, Method / Memory Systems / Workspace-user-global hybrid memory, Evaluation / Benchmarking / LatentNeeds human-edited data
License CC BY 4.0

Abstract Overview

This paper proposes DD-MM-PAS (Demand Detection, Memory Modeling, Proactive Agent System), a paradigm for streaming proactive AI agents that infer latent user needs from ongoing context rather than waiting for explicit prompts. The authors instantiate this paradigm in Pask, which combines the IntentFlow streaming model for demand detection (built on Qwen3-30B-A3B with SFT and RL training), a hierarchical long-term memory system spanning workspace, user, and global levels, and a full system infrastructure supporting always-on deployment. They also introduce LatentNeeds-Bench, a benchmark of 100 real-world sessions (3,936 turns) across work, learning, and daily-life domains, constructed from user-consented data and refined through thousands of rounds of human editing. Experiments compare IntentFlow against nine baseline LLMs on turn-level proactive demand detection.

Novelty

The paper's main novelty is framing proactive assistance as a full-stack problem that jointly addresses streaming demand detection (with three decision modes: silent, fast intervention, full assistance), self-evolving hierarchical long-term memory, and system-level deployment in a single unified paradigm. It also contributes LatentNeeds-Bench, a dedicated benchmark for latent-need detection built from real-world multi-turn sessions with human refinement, and a two-stage training recipe combining SFT on 100k synthetic samples with RL on 2k real-world sessions for intent alignment.

Results

On LatentNeeds-Bench, IntentFlow achieves the highest overall balanced accuracy (84.2), with 83.1 on demand turns and 85.2 on non-demand turns, outperforming Gemini-3-Flash (80.8 overall) while remaining slightly behind it on demand-turn accuracy alone (83.1 vs. 83.3). In 60-turn multi-round evaluation, IntentFlow's balanced accuracy declines only 5.0% (86.1 to 81.8), compared to 19.0% for GPT-5-Mini and 17.3% for Gemini-3-Flash. IntentFlow also exhibits the lowest per-turn latency among models generating responses (approximately 1.3–1.5 seconds), and a user study suggests stronger usefulness in learning scenarios than in daily-life settings.

Key Points

  1. Pask instantiates the DD-MM-PAS paradigm by linking IntentFlow (a streaming demand detector with three action modes: silent, fast intervention, and memory-grounded full assistance), a hierarchical memory system (user, workspace, global), and an always-on proactive agent system.
  2. IntentFlow achieves the best balanced accuracy (84.2) on LatentNeeds-Bench across ten baseline LLMs, and maintains relatively stable performance over 60-turn conversations (only 5.0% decline), while operating at low latency (~1.3–1.5s per turn).
  3. The benchmark reveals that proactive intent detection remains challenging for many existing LLMs—several strong models score below 40 on demand-turn accuracy—highlighting a gap between general language capability and the ability to reliably identify unstated user needs.

References

This page was created using generative AI such as GPT-5, Claude Opus 4, Gemini 3, Gemini 3.1 Flash Image, and their higher-end successor versions. No guarantee can be made regarding its contents.