UserBench: An Interactive Gym Environment for User-Centric Agents
- URL: http://arxiv.org/abs/2507.22034v1
- Date: Tue, 29 Jul 2025 17:34:12 GMT
- Title: UserBench: An Interactive Gym Environment for User-Centric Agents
- Authors: Cheng Qian, Zuxin Liu, Akshara Prabhakar, Zhiwei Liu, Jianguo Zhang, Haolin Chen, Heng Ji, Weiran Yao, Shelby Heinecke, Silvio Savarese, Caiming Xiong, Huan Wang,
- Abstract summary: Large Language Models (LLMs)-based agents have made impressive progress in reasoning and tool use, but their ability to proactively collaborate with users remains underexplored.<n>We introduce UserBench, a user-centric benchmark designed to evaluate agents in multi-turn, preference-driven interactions.
- Score: 110.77212949007958
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs)-based agents have made impressive progress in reasoning and tool use, enabling them to solve complex tasks. However, their ability to proactively collaborate with users, especially when goals are vague, evolving, or indirectly expressed, remains underexplored. To address this gap, we introduce UserBench, a user-centric benchmark designed to evaluate agents in multi-turn, preference-driven interactions. UserBench features simulated users who start with underspecified goals and reveal preferences incrementally, requiring agents to proactively clarify intent and make grounded decisions with tools. Our evaluation of leading open- and closed-source LLMs reveals a significant disconnect between task completion and user alignment. For instance, models provide answers that fully align with all user intents only 20% of the time on average, and even the most advanced models uncover fewer than 30% of all user preferences through active interaction. These results highlight the challenges of building agents that are not just capable task executors, but true collaborative partners. UserBench offers an interactive environment to measure and advance this critical capability.
Related papers
- PersonaAgent: When Large Language Model Agents Meet Personalization at Test Time [87.99027488664282]
PersonaAgent is a framework designed to address versatile personalization tasks.<n>It integrates a personalized memory module and a personalized action module.<n>Test-time user-preference alignment strategy ensures real-time user preference alignment.
arXiv Detail & Related papers (2025-06-06T17:29:49Z) - Creating General User Models from Computer Use [62.91116265732001]
This paper presents an architecture for a general user model (GUM) that learns about you by observing any interaction you have with your computer.<n>The GUM takes as input any unstructured observation of a user (e.g., device screenshots) and constructs confidence-weighted propositions that capture user knowledge and preferences.
arXiv Detail & Related papers (2025-05-16T04:00:31Z) - SmartAgent: Chain-of-User-Thought for Embodied Personalized Agent in Cyber World [50.937342998351426]
Chain-of-User-Thought (COUT) is a novel embodied reasoning paradigm.<n>We introduce SmartAgent, an agent framework perceiving cyber environments and reasoning personalized requirements.<n>Our work is the first to formulate the COUT process, serving as a preliminary attempt towards embodied personalized agent learning.
arXiv Detail & Related papers (2024-12-10T12:40:35Z) - ReSpAct: Harmonizing Reasoning, Speaking, and Acting Towards Building Large Language Model-Based Conversational AI Agents [11.118991548784459]
Large language model (LLM)-based agents are increasingly employed to interact with external environments.<n>ReSpAct is designed to seamlessly integrate reasoning, decision-making, and dynamic dialogue for task-solving.<n>We evaluate ReSpAct in user-interactive settings, including task-oriented dialogue systems (MultiWOZ) and decision-making tasks (ALFWorld, WebShop)
arXiv Detail & Related papers (2024-11-01T15:57:45Z) - Empowering Agile-Based Generative Software Development through Human-AI Teamwork [24.743864861980803]
We propose AgileGen, an agile-based generative software development through human-AI teamwork.
A memory pool mechanism is used to collect user decision-making scenarios and recommend them to new users.
arXiv Detail & Related papers (2024-07-22T11:54:44Z) - Tell Me More! Towards Implicit User Intention Understanding of Language
Model Driven Agents [110.25679611755962]
Current language model-driven agents often lack mechanisms for effective user participation, which is crucial given the vagueness commonly found in user instructions.
We introduce Intention-in-Interaction (IN3), a novel benchmark designed to inspect users' implicit intentions through explicit queries.
We empirically train Mistral-Interact, a powerful model that proactively assesses task vagueness, inquires user intentions, and refines them into actionable goals.
arXiv Detail & Related papers (2024-02-14T14:36:30Z) - AgentCF: Collaborative Learning with Autonomous Language Agents for
Recommender Systems [112.76941157194544]
We propose AgentCF for simulating user-item interactions in recommender systems through agent-based collaborative filtering.
We creatively consider not only users but also items as agents, and develop a collaborative learning approach that optimize both kinds of agents together.
Overall, the optimized agents exhibit diverse interaction behaviors within our framework, including user-item, user-user, item-item, and collective interactions.
arXiv Detail & Related papers (2023-10-13T16:37:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.