Related papers: Long-term Task-oriented Agent: Proactive Long-term Intent Maintenance in Dynamic Environments

Long-term Task-oriented Agent: Proactive Long-term Intent Maintenance in Dynamic Environments

URL: http://arxiv.org/abs/2601.09382v1
Date: Wed, 14 Jan 2026 11:15:40 GMT
Title: Long-term Task-oriented Agent: Proactive Long-term Intent Maintenance in Dynamic Environments
Authors: Qinglong Shi, Donghai Wang, Hantao Zhou, Jiguo Li, Jun Xu, Jiuchong Gao, Jinghua Hao, Renqing He,
Abstract summary: Current large language model agents operate under a reactive paradigm, responding only to immediate user queries within short-term sessions.<n>We propose a novel interaction paradigm for proactive Task-oriented Agents capable of bridging the gap between relatively static user's needs and a dynamic environment.<n>We introduce a high-quality data synthesis pipeline to construct complex, multi-turn dialog data in a dynamic environment.
Score: 8.937298475124484
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Current large language model agents predominantly operate under a reactive paradigm, responding only to immediate user queries within short-term sessions. This limitation hinders their ability to maintain long-term user's intents and dynamically adapt to evolving external environments. In this paper, we propose a novel interaction paradigm for proactive Task-oriented Agents capable of bridging the gap between relatively static user's needs and a dynamic environment. We formalize proactivity through two key capabilities, (i) Intent-Conditioned Monitoring: The agent autonomously formulates trigger conditions based on dialog history; (ii) Event-Triggered Follow-up: The agent actively engages the user upon detecting useful environmental updates. We introduce a high-quality data synthesis pipeline to construct complex, multi-turn dialog data in a dynamic environment. Furthermore, we attempt to address the lack of evaluation criteria of task-oriented interaction in a dynamic environment by proposing a new benchmark, namely ChronosBench. We evaluated some leading close-source and open-source models at present and revealed their flaws in long-term task-oriented interaction. Furthermore, our fine-tuned model trained using synthetic data for supervised learning achieves a task completion rate of 85.19% for complex tasks including shifts in user intent, outperforming other models under test. And the result validated the effectiveness of our data-driven strategy.

Related papers

Pushing Forward Pareto Frontiers of Proactive Agents with Behavioral Agentic Optimization [61.641777037967366]
Proactive large language model (LLM) agents aim to actively plan, query, and interact over multiple turns.<n>Agentic reinforcement learning (RL) has emerged as a promising solution for training such agents in multi-turn settings.<n>We propose BAO, an agentic RL framework that combines behavior enhancement to enrich proactive reasoning and information-gathering capabilities.
arXiv Detail & Related papers (2026-02-11T20:40:43Z)
Autonomous Continual Learning of Computer-Use Agents for Environment Adaptation [57.65688895630163]
We introduce ACuRL, an Autonomous Curriculum Reinforcement Learning framework that continually adapts agents to specific environments with zero human data.<n>Our method effectively enables both intra-environment and cross-environment continual learning, yielding 4-22% performance gains without forgetting existing environments.
arXiv Detail & Related papers (2026-02-10T23:06:02Z)
AgentLongBench: A Controllable Long Benchmark For Long-Contexts Agents via Environment Rollouts [78.33143446024485]
We introduce textbfAgentLongBench, which evaluates agents through simulated environment rollouts based on Lateral Thinking Puzzles.<n>This framework generates rigorous interaction trajectories across knowledge-intensive and knowledge-free scenarios.
arXiv Detail & Related papers (2026-01-28T16:05:44Z)
User-Oriented Multi-Turn Dialogue Generation with Tool Use at scale [5.641245411366927]
We develop a framework for automated task-oriented multi-turn dialogue generation at scale.<n>Our generation pipeline operates as a versatile, plug-and-play module capable of initiating generation from any state.<n>It yields a high-density dataset that reflects the multifaceted demands of real-world human-agent interaction.
arXiv Detail & Related papers (2026-01-13T05:14:09Z)
Evaluating Long-Context Reasoning in LLM-Based WebAgents [22.264781808930948]
This paper introduces a benchmark for evaluating long context reasoning capabilities of WebAgents.<n>We observe a dramatic performance degradation as context length increases, with success rates dropping from 40-50% in baseline conditions to less than 10% in long context scenarios.<n>Our detailed error analysis reveals that agents primarily fail due to getting stuck in loops and losing track of original task objectives.
arXiv Detail & Related papers (2025-12-03T22:53:10Z)
Scaling Environments for LLM Agents in the Era of Learning from Interaction: A Survey [30.673419015614233]
A growing consensus is that agents should interact directly with environments and learn from experience through reinforcement learning.<n>We formalize this iterative process as the Generation-Execution-Feedback (GEF) loop, where environments generate tasks to challenge agents, return observations in response to agents' actions during task execution, and provide evaluative feedback on rollouts for subsequent learning.<n>Under this paradigm, environments function as indispensable producers of experiential data, highlighting the need to scale them toward greater complexity, realism, and interactivity.
arXiv Detail & Related papers (2025-11-12T12:56:25Z)
UserBench: An Interactive Gym Environment for User-Centric Agents [110.77212949007958]
Large Language Models (LLMs)-based agents have made impressive progress in reasoning and tool use, but their ability to proactively collaborate with users remains underexplored.<n>We introduce UserBench, a user-centric benchmark designed to evaluate agents in multi-turn, preference-driven interactions.
arXiv Detail & Related papers (2025-07-29T17:34:12Z)
CLEA: Closed-Loop Embodied Agent for Enhancing Task Execution in Dynamic Environments [39.5949489828609]
Large Language Models (LLMs) exhibit remarkable capabilities in the hierarchical decomposition of complex tasks through semantic reasoning.<n>We propose Closed-Loop Embodied Agent (CLEA) -- a novel architecture incorporating four specialized open-source LLMs with functional decoupling for closed-loop task management.<n>We conduct experiments in a real environment with manipulable objects, using two heterogeneous robots for object search, manipulation, and search-manipulation integration tasks.
arXiv Detail & Related papers (2025-03-02T04:50:59Z)
Dynamic benchmarking framework for LLM-based conversational data capture [0.0]
This paper introduces a benchmarking framework to assess large language models (LLMs)<n>It integrates generative agent simulation to evaluate performance on key dimensions: information extraction, context awareness, and adaptive engagement.<n>Results show that adaptive strategies improve data extraction accuracy, especially when handling ambiguous responses.
arXiv Detail & Related papers (2025-02-04T15:47:47Z)
Collaborative Instance Object Navigation: Leveraging Uncertainty-Awareness to Minimize Human-Agent Dialogues [54.81155589931697]
Collaborative Instance object Navigation (CoIN) is a new task setting where the agent actively resolve uncertainties about the target instance.<n>We propose a novel training-free method, Agent-user Interaction with UncerTainty Awareness (AIUTA)<n>First, upon object detection, a Self-Questioner model initiates a self-dialogue within the agent to obtain a complete and accurate observation description.<n>An Interaction Trigger module determines whether to ask a question to the human, continue or halt navigation.
arXiv Detail & Related papers (2024-12-02T08:16:38Z)
Generative Prompt Internalization [48.91617280112579]
We propose Generative Prompt Internalization (GenPI), a lightweight method that employs a joint training approach.<n>GenPI not only replicates the behavior of models with prompt inputs but also generates the content of the prompt.<n>We demonstrate that our approach effectively internalizes complex prompts across various agent-based application scenarios.
arXiv Detail & Related papers (2024-11-24T17:32:20Z)
A Survey on Complex Tasks for Goal-Directed Interactive Agents [60.53915548970061]
This survey compiles relevant tasks and environments for evaluating goal-directed interactive agents. An up-to-date compilation of relevant resources can be found on our project website.
arXiv Detail & Related papers (2024-09-27T08:17:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.