Persona2Web: Benchmarking Personalized Web Agents for Contextual Reasoning with User History
- URL: http://arxiv.org/abs/2602.17003v1
- Date: Thu, 19 Feb 2026 01:54:26 GMT
- Title: Persona2Web: Benchmarking Personalized Web Agents for Contextual Reasoning with User History
- Authors: Serin Kim, Sangam Lee, Dongha Lee,
- Abstract summary: Persona2Web is the first benchmark for evaluating personalized web agents on the real open web.<n>It consists of: (1) user histories that reveal preferences implicitly over long time spans, (2) ambiguous queries that require agents to infer implicit user preferences, and (3) a reasoning-aware evaluation framework that enables fine-grained assessment of personalization.
- Score: 8.085230376705887
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Large language models have advanced web agents, yet current agents lack personalization capabilities. Since users rarely specify every detail of their intent, practical web agents must be able to interpret ambiguous queries by inferring user preferences and contexts. To address this challenge, we present Persona2Web, the first benchmark for evaluating personalized web agents on the real open web, built upon the clarify-to-personalize principle, which requires agents to resolve ambiguity based on user history rather than relying on explicit instructions. Persona2Web consists of: (1) user histories that reveal preferences implicitly over long time spans, (2) ambiguous queries that require agents to infer implicit user preferences, and (3) a reasoning-aware evaluation framework that enables fine-grained assessment of personalization. We conduct extensive experiments across various agent architectures, backbone models, history access schemes, and queries with varying ambiguity levels, revealing key challenges in personalized web agent behavior. For reproducibility, our codes and datasets are publicly available at https://anonymous.4open.science/r/Persona2Web-73E8.
Related papers
- Me-Agent: A Personalized Mobile Agent with Two-Level User Habit Learning for Enhanced Interaction [20.029487905328004]
We propose Me-Agent, a learnable and memorable personalized mobile agent.<n>Me-Agent incorporates a two-level user habit learning approach.<n>Me-Agent achieves state-of-the-art performance in personalization while maintaining competitive instruction execution performance.
arXiv Detail & Related papers (2026-01-28T01:44:19Z) - Towards Proactive Personalization through Profile Customization for Individual Users in Dialogues [28.522406727886395]
PersonalAgent is a lifelong agent designed to continuously infer and adapt to user preferences.<n>Experiments show that PersonalAgent achieves superior performance over strong prompt-based and policy optimization baselines.<n>Our findings underscore the importance of lifelong personalization for developing more inclusive and adaptive conversational agents.
arXiv Detail & Related papers (2025-12-17T10:47:06Z) - A Generative Framework for Personalized Sticker Retrieval [73.57899194210141]
We propose PEARL, a novel generative framework for personalized sticker retrieval.<n>We make two key contributions: (i) To encode user-specific sticker preferences, we design a representation learning model to learn discriminative user representations, and (ii) To generate stickers aligned with a user's query intent, we propose a novel intent-aware learning objective.<n> Empirical results from both offline evaluations and online tests demonstrate that PEARL significantly outperforms state-of-the-art methods.
arXiv Detail & Related papers (2025-09-22T13:11:44Z) - PersonaAgent: When Large Language Model Agents Meet Personalization at Test Time [87.99027488664282]
PersonaAgent is a framework designed to address versatile personalization tasks.<n>It integrates a personalized memory module and a personalized action module.<n>Test-time user-preference alignment strategy ensures real-time user preference alignment.
arXiv Detail & Related papers (2025-06-06T17:29:49Z) - Personalized Query Auto-Completion for Long and Short-Term Interests with Adaptive Detoxification Generation [18.762185355073008]
We propose a novel model (LaD) that captures personalized information from both long-term and short-term interests.<n>In LaD, personalized information is captured hierarchically at both coarse-grained and fine-grained levels.<n>Our model has been deployed on Kuaishou search, driving the primary traffic for hundreds of millions of active users.
arXiv Detail & Related papers (2025-05-27T09:58:42Z) - PersonaBench: Evaluating AI Models on Understanding Personal Information through Accessing (Synthetic) Private User Data [76.21047984886273]
Personalization is critical in AI assistants, particularly in the context of private AI models that work with individual users.<n>Due to the sensitive nature of such data, there are no publicly available datasets that allow us to assess an AI model's ability to understand users.<n>We introduce a synthetic data generation pipeline that creates diverse, realistic user profiles and private documents simulating human activities.
arXiv Detail & Related papers (2025-02-28T00:43:35Z) - Large Language Models Empowered Personalized Web Agents [54.944908837494374]
Web agents have evolved from traditional agents to Large Language Models (LLMs)-based Web agents.<n>We first formulate the task of LLM-empowered personalized Web agents, which integrate personalized data and user instructions.<n>We propose a Personalized User Memory-enhanced Alignment (PUMA) framework to adapt LLMs to the personalized Web agent task.
arXiv Detail & Related papers (2024-10-22T17:54:45Z) - Instruct and Extract: Instruction Tuning for On-Demand Information
Extraction [86.29491354355356]
On-Demand Information Extraction aims to fulfill the personalized demands of real-world users.
We present a benchmark named InstructIE, inclusive of both automatically generated training data, as well as the human-annotated test set.
Building on InstructIE, we further develop an On-Demand Information Extractor, ODIE.
arXiv Detail & Related papers (2023-10-24T17:54:25Z) - A Neural Topical Expansion Framework for Unstructured Persona-oriented
Dialogue Generation [52.743311026230714]
Persona Exploration and Exploitation (PEE) is able to extend the predefined user persona description with semantically correlated content.
PEE consists of two main modules: persona exploration and persona exploitation.
Our approach outperforms state-of-the-art baselines in terms of both automatic and human evaluations.
arXiv Detail & Related papers (2020-02-06T08:24:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.