Related papers: OPeRA: A Dataset of Observation, Persona, Rationale, and Action for Evaluating LLMs on Human Online Shopping Behavior Simulation

OPeRA: A Dataset of Observation, Persona, Rationale, and Action for Evaluating LLMs on Human Online Shopping Behavior Simulation

URL: http://arxiv.org/abs/2506.05606v4
Date: Thu, 24 Jul 2025 06:52:49 GMT
Title: OPeRA: A Dataset of Observation, Persona, Rationale, and Action for Evaluating LLMs on Human Online Shopping Behavior Simulation
Authors: Ziyi Wang, Yuxuan Lu, Wenbo Li, Amirali Amini, Bo Sun, Yakov Bart, Weimin Lyu, Jiri Gesi, Tian Wang, Jing Huang, Yu Su, Upol Ehsan, Malihe Alikhani, Toby Jia-Jun Li, Lydia Chilton, Dakuo Wang,
Abstract summary: OPERA is the first public dataset that comprehensively captures user personas, browser observations, fine-grained web actions, and self-reported just-in-time rationales.<n>We establish the first benchmark to evaluate how well current LLMs can predict a specific user's next action and rationale.
Score: 56.47029531207105
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Can large language models (LLMs) accurately simulate the next web action of a specific user? While LLMs have shown promising capabilities in generating ``believable'' human behaviors, evaluating their ability to mimic real user behaviors remains an open challenge, largely due to the lack of high-quality, publicly available datasets that capture both the observable actions and the internal reasoning of an actual human user. To address this gap, we introduce OPERA, a novel dataset of Observation, Persona, Rationale, and Action collected from real human participants during online shopping sessions. OPERA is the first public dataset that comprehensively captures: user personas, browser observations, fine-grained web actions, and self-reported just-in-time rationales. We developed both an online questionnaire and a custom browser plugin to gather this dataset with high fidelity. Using OPERA, we establish the first benchmark to evaluate how well current LLMs can predict a specific user's next action and rationale with a given persona and <observation, action, rationale> history. This dataset lays the groundwork for future research into LLM agents that aim to act as personalized digital twins for human.

Related papers

Web-Browsing LLMs Can Access Social Media Profiles and Infer User Demographics [7.849709311008473]
Large language models (LLMs) have traditionally relied on static training data, limiting their knowledge to fixed snapshots.<n>Recent advancements have equipped LLMs with web browsing capabilities, enabling real time information retrieval and multi step reasoning over live web content.<n>Here, we evaluate whether web browsing LLMs can infer demographic attributes of social media users given only their usernames.<n>We show that these models can access social media content and predict user demographics with reasonable accuracy.
arXiv Detail & Related papers (2025-07-16T16:21:01Z)
Know Me, Respond to Me: Benchmarking LLMs for Dynamic User Profiling and Personalized Responses at Scale [51.9706400130481]
Large Language Models (LLMs) have emerged as personalized assistants for users across a wide range of tasks.<n> PERSONAMEM features curated user profiles with over 180 simulated user-LLM interaction histories.<n>We evaluate LLM chatbots' ability to identify the most suitable response according to the current state of the user's profile.
arXiv Detail & Related papers (2025-04-19T08:16:10Z)
Exploring Human-Like Thinking in Search Simulations with Large Language Models [9.825091149361208]
Simulating user search behavior is a critical task in information retrieval.<n>Recent advancements in large language models (LLMs) have opened up new possibilities for generating human-like actions.<n>We explore the integration of human-like thinking into search simulations by leveraging LLMs to simulate users' hidden cognitive processes.
arXiv Detail & Related papers (2025-04-10T09:04:58Z)
Prompting is Not All You Need! Evaluating LLM Agent Simulation Methodologies with Real-World Online Customer Behavior Data [62.61900377170456]
We focus on evaluating LLM's objective accuracy'' rather than the subjective believability'' in simulating human behavior.<n>We present the first comprehensive evaluation of state-of-the-art LLMs on the task of web shopping action generation.
arXiv Detail & Related papers (2025-03-26T17:33:27Z)
Can LLMs Simulate Social Media Engagement? A Study on Action-Guided Response Generation [51.44040615856536]
This paper analyzes large language models' ability to simulate social media engagement through action guided response generation.<n>We benchmark GPT-4o-mini, O1-mini, and DeepSeek-R1 in social media engagement simulation regarding a major societal event.
arXiv Detail & Related papers (2025-02-17T17:43:08Z)
Agentic Society: Merging skeleton from real world and texture from Large Language Model [4.740886789811429]
This paper explores a novel framework that leverages census data and large language models to generate virtual populations. We show that our method produces personas with variability essential for simulating diverse human behaviors in social science experiments. But the evaluation result shows that only weak sign of statistical truthfulness can be produced due to limited capability of current LLMs.
arXiv Detail & Related papers (2024-09-02T08:28:19Z)
CXSimulator: A User Behavior Simulation using LLM Embeddings for Web-Marketing Campaign Assessment [6.405046045596434]
This paper presents a novel framework designed to assess the effects of untested web-marketing campaigns through user behavior simulations. We use large language models (LLMs) to represent various events in a user's behavioral history, such as viewing an item, applying a coupon, or purchasing an item, as semantic embedding vectors. We leverage this transition prediction model to simulate how users might react differently when new campaigns or products are presented to them.
arXiv Detail & Related papers (2024-07-31T12:22:40Z)
From Persona to Personalization: A Survey on Role-Playing Language Agents [52.783043059715546]
Recent advancements in large language models (LLMs) have boosted the rise of Role-Playing Language Agents (RPLAs) RPLAs achieve a remarkable sense of human likeness and vivid role-playing performance. They have catalyzed numerous AI applications, such as emotional companions, interactive video games, personalized assistants and copilots.
arXiv Detail & Related papers (2024-04-28T15:56:41Z)
BASES: Large-scale Web Search User Simulation with Large Language Model based Agents [108.97507653131917]
BASES is a novel user simulation framework with large language models (LLMs) Our simulation framework can generate unique user profiles at scale, which subsequently leads to diverse search behaviors. WARRIORS is a new large-scale dataset encompassing web search user behaviors, including both Chinese and English versions.
arXiv Detail & Related papers (2024-02-27T13:44:09Z)
On the steerability of large language models toward data-driven personas [98.9138902560793]
Large language models (LLMs) are known to generate biased responses where the opinions of certain groups and populations are underrepresented. Here, we present a novel approach to achieve controllable generation of specific viewpoints using LLMs.
arXiv Detail & Related papers (2023-11-08T19:01:13Z)
User Behavior Simulation with Large Language Model based Agents [116.74368915420065]
We propose an LLM-based agent framework and design a sandbox environment to simulate real user behaviors. Based on extensive experiments, we find that the simulated behaviors of our method are very close to the ones of real humans.
arXiv Detail & Related papers (2023-06-05T02:58:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.