Related papers: PAARS: Persona Aligned Agentic Retail Shoppers

Related papers

HumanStudy-Bench: Towards AI Agent Design for Participant Simulation [11.906370453952265]
Large language models (LLMs) are increasingly used as simulated participants in social science experiments.<n>We introduce HUMANSTUDY-BENCH, a benchmark and execution engine that orchestrates LLM-based agents to reconstruct human-subject experiments.<n>To evaluate fidelity at the level of scientific inference, we propose new metrics to quantify how much human and agent behaviors agree.
arXiv Detail & Related papers (2026-01-31T12:07:42Z)
HumanLLM: Towards Personalized Understanding and Simulation of Human Nature [72.55730315685837]
HumanLLM is a foundation model designed for personalized understanding and simulation of individuals.<n>We first construct the Cognitive Genome, a large-scale corpus curated from real-world user data on platforms like Reddit, Twitter, Blogger, and Amazon.<n>We then formulate diverse learning tasks and perform supervised fine-tuning to empower the model to predict a wide range of individualized human behaviors, thoughts, and experiences.
arXiv Detail & Related papers (2026-01-22T09:27:27Z)
AlignUSER: Human-Aligned LLM Agents via World Models for Recommender System Evaluation [0.7031557790463293]
We introduce Align, a framework that learns world-model-driven agents from human interactions.<n>We generate counterfactual trajectories around demonstrations and prompt the LLM to compare its decisions with human choices, identify suboptimal actions, and extract lessons.
arXiv Detail & Related papers (2026-01-02T03:01:33Z)
How can we assess human-agent interactions? Case studies in software agent design [52.953425368394306]
We make two major steps towards the rigorous assessment of human-agent interactions.<n>We propose PULSE, a framework for more efficient human-centric evaluation of agent designs.<n>We deploy the framework on a large-scale web platform built around the open-source software agent OpenHands.
arXiv Detail & Related papers (2025-10-10T19:04:28Z)
The Social Laboratory: A Psychometric Framework for Multi-Agent LLM Evaluation [0.16921396880325779]
We introduce a novel evaluation framework that uses multi-agent debate as a controlled "social laboratory"<n>We show that assigned personas induce stable, measurable psychometric profiles, particularly in cognitive effort.<n>This work provides a blueprint for a new class of dynamic, psychometrically grounded evaluation protocols.
arXiv Detail & Related papers (2025-10-01T07:10:28Z)
Prompts to Proxies: Emulating Human Preferences via a Compact LLM Ensemble [46.82793004650415]
Large language models (LLMs) have demonstrated promise in emulating human-like responses across a range of tasks.<n>We propose a novel alignment framework that treats LLMs as agent proxies for human survey respondents.<n>We introduce P2P, a system that steers LLM agents toward representative behavioral patterns using structured prompt engineering, entropy-based sampling, and regression-based selection.
arXiv Detail & Related papers (2025-09-14T15:08:45Z)
Population-Aligned Persona Generation for LLM-based Social Simulation [58.84363795421489]
We propose a systematic framework for synthesizing high-quality, population-aligned persona sets for social simulation.<n>Our approach begins by leveraging large language models to generate narrative personas from long-term social media data.<n>To address the needs of specific simulation contexts, we introduce a task-specific module that adapts the globally aligned persona set to targeted subpopulations.
arXiv Detail & Related papers (2025-09-12T10:43:47Z)
OPeRA: A Dataset of Observation, Persona, Rationale, and Action for Evaluating LLMs on Human Online Shopping Behavior Simulation [56.47029531207105]
OPERA is the first public dataset that comprehensively captures user personas, browser observations, fine-grained web actions, and self-reported just-in-time rationales.<n>We establish the first benchmark to evaluate how well current LLMs can predict a specific user's next action and rationale.
arXiv Detail & Related papers (2025-06-05T21:37:49Z)
Assessing Collective Reasoning in Multi-Agent LLMs via Hidden Profile Tasks [5.120446836495469]
We introduce the Hidden Profile paradigm from social psychology as a diagnostic testbed for multi-agent LLM systems.<n>By distributing critical information asymmetrically across agents, the paradigm reveals how inter-agent dynamics support or hinder collective reasoning.<n>We find that while cooperative agents are prone to over-coordination in collective settings, increased contradiction impairs group convergence.
arXiv Detail & Related papers (2025-05-15T19:22:54Z)
Higher-Order Binding of Language Model Virtual Personas: a Study on Approximating Political Partisan Misperceptions [4.234771450043289]
Large language models (LLMs) are increasingly capable of simulating human behavior. We propose a novel methodology for constructing virtual personas with synthetic user backstories" generated as extended, multi-turn interview transcripts. Our generated backstories are longer, rich in detail, and consistent in authentically describing a singular individual, compared to previous methods.
arXiv Detail & Related papers (2025-04-16T00:10:34Z)
AgentA/B: Automated and Scalable Web A/BTesting with Interactive LLM Agents [28.20409050985182]
A/B testing remains constrained by its dependence on the large-scale and live traffic of human participants. We present AgentA/B, a novel system that automatically simulate user interaction behaviors with real webpages. Our findings suggest AgentA/B can emulate human-like behavior patterns.
arXiv Detail & Related papers (2025-04-13T21:10:56Z)
LLM Generated Persona is a Promise with a Catch [18.45442859688198]
Persona-based simulations hold promise for transforming disciplines that rely on population-level feedback.<n>Traditional methods to collect realistic persona data face challenges.<n>They are prohibitively expensive and logistically challenging due to privacy constraints.
arXiv Detail & Related papers (2025-03-18T03:11:27Z)
Generative Agent Simulations of 1,000 People [56.82159813294894]
We present a novel agent architecture that simulates the attitudes and behaviors of 1,052 real individuals. The generative agents replicate participants' responses on the General Social Survey 85% as accurately as participants replicate their own answers. Our architecture reduces accuracy biases across racial and ideological groups compared to agents given demographic descriptions.
arXiv Detail & Related papers (2024-11-15T11:14:34Z)
Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance [95.03771007780976]
We tackle the challenge of developing proactive agents capable of anticipating and initiating tasks without explicit human instructions.<n>First, we collect real-world human activities to generate proactive task predictions.<n>These predictions are labeled by human annotators as either accepted or rejected.<n>The labeled data is used to train a reward model that simulates human judgment.
arXiv Detail & Related papers (2024-10-16T08:24:09Z)
Chatting Up Attachment: Using LLMs to Predict Adult Bonds [0.0]
We use GPT-4 and Claude 3 Opus to create agents that simulate adults with varying profiles, childhood memories, and attachment styles. We evaluate our models using a transcript dataset from 9 humans who underwent the same interview protocol, analyzed and labeled by mental health professionals. Our findings indicate that training the models using only synthetic data achieves performance comparable to training the models on human data.
arXiv Detail & Related papers (2024-08-31T04:29:19Z)
PersonaGym: Evaluating Persona Agents and LLMs [47.75926334294358]
We introduce PersonaGym, the first dynamic evaluation framework for assessing persona agents, and PersonaScore, the first automated human-aligned metric grounded in decision theory.<n>Our evaluation of 6 open and closed-source LLMs, using a benchmark encompassing 200 personas and 10,000 questions, reveals significant opportunities for advancement in persona agent capabilities.
arXiv Detail & Related papers (2024-07-25T22:24:45Z)
Select to Perfect: Imitating desired behavior from large multi-agent data [28.145889065013687]
Desired characteristics for AI agents can be expressed by assigning desirability scores. We first assess the effect of each individual agent's behavior on the collective desirability score. We propose the concept of an agent's Exchange Value, which quantifies an individual agent's contribution to the collective desirability score.
arXiv Detail & Related papers (2024-05-06T15:48:24Z)
Unlocking the `Why' of Buying: Introducing a New Dataset and Benchmark for Purchase Reason and Post-Purchase Experience [24.949929747493204]
We propose purchase reason prediction as a novel task for modern AI models. We first generate a dataset that consists of real-world explanations of why users make certain purchase decisions for various products. Our approach induces LLMs to explicitly distinguish between the reasons behind purchasing a product and the experience after the purchase in a user review.
arXiv Detail & Related papers (2024-02-20T23:04:06Z)
AgentCF: Collaborative Learning with Autonomous Language Agents for Recommender Systems [112.76941157194544]
We propose AgentCF for simulating user-item interactions in recommender systems through agent-based collaborative filtering. We creatively consider not only users but also items as agents, and develop a collaborative learning approach that optimize both kinds of agents together. Overall, the optimized agents exhibit diverse interaction behaviors within our framework, including user-item, user-user, item-item, and collective interactions.
arXiv Detail & Related papers (2023-10-13T16:37:14Z)
Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs) We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing. We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z)
Scalable Evaluation of Multi-Agent Reinforcement Learning with Melting Pot [71.28884625011987]
Melting Pot is a MARL evaluation suite that uses reinforcement learning to reduce the human labor required to create novel test scenarios. We have created over 80 unique test scenarios covering a broad range of research topics. We apply these test scenarios to standard MARL training algorithms, and demonstrate how Melting Pot reveals weaknesses not apparent from training performance alone.
arXiv Detail & Related papers (2021-07-14T17:22:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.