PASK: Toward Intent-Aware Proactive Agents with Long-Term Memory
Abstract Overview
This paper proposes DD-MM-PAS (Demand Detection, Memory Modeling, Proactive Agent System), a paradigm for streaming proactive AI agents that infer latent user needs from ongoing context rather than waiting for explicit prompts. The authors instantiate this paradigm in Pask, which combines the IntentFlow streaming model for demand detection (built on Qwen3-30B-A3B with SFT and RL training), a hierarchical long-term memory system spanning workspace, user, and global levels, and a full system infrastructure supporting always-on deployment. They also introduce LatentNeeds-Bench, a benchmark of 100 real-world sessions (3,936 turns) across work, learning, and daily-life domains, constructed from user-consented data and refined through thousands of rounds of human editing. Experiments compare IntentFlow against nine baseline LLMs on turn-level proactive demand detection.
Novelty
The paper's main novelty is framing proactive assistance as a full-stack problem that jointly addresses streaming demand detection (with three decision modes: silent, fast intervention, full assistance), self-evolving hierarchical long-term memory, and system-level deployment in a single unified paradigm. It also contributes LatentNeeds-Bench, a dedicated benchmark for latent-need detection built from real-world multi-turn sessions with human refinement, and a two-stage training recipe combining SFT on 100k synthetic samples with RL on 2k real-world sessions for intent alignment.
Results
On LatentNeeds-Bench, IntentFlow achieves the highest overall balanced accuracy (84.2), with 83.1 on demand turns and 85.2 on non-demand turns, outperforming Gemini-3-Flash (80.8 overall) while remaining slightly behind it on demand-turn accuracy alone (83.1 vs. 83.3). In 60-turn multi-round evaluation, IntentFlow's balanced accuracy declines only 5.0% (86.1 to 81.8), compared to 19.0% for GPT-5-Mini and 17.3% for Gemini-3-Flash. IntentFlow also exhibits the lowest per-turn latency among models generating responses (approximately 1.3–1.5 seconds), and a user study suggests stronger usefulness in learning scenarios than in daily-life settings.
Key Points
- Pask instantiates the DD-MM-PAS paradigm by linking IntentFlow (a streaming demand detector with three action modes: silent, fast intervention, and memory-grounded full assistance), a hierarchical memory system (user, workspace, global), and an always-on proactive agent system.
- IntentFlow achieves the best balanced accuracy (84.2) on LatentNeeds-Bench across ten baseline LLMs, and maintains relatively stable performance over 60-turn conversations (only 5.0% decline), while operating at low latency (~1.3–1.5s per turn).
- The benchmark reveals that proactive intent detection remains challenging for many existing LLMs—several strong models score below 40 on demand-turn accuracy—highlighting a gap between general language capability and the ability to reliably identify unstated user needs.