Related papers: AgentA/B: Automated and Scalable Web A/BTesting with Interactive LLM Agents

AgentA/B: Automated and Scalable Web A/BTesting with Interactive LLM Agents

URL: http://arxiv.org/abs/2504.09723v2
Date: Mon, 21 Apr 2025 23:57:49 GMT
Title: AgentA/B: Automated and Scalable Web A/BTesting with Interactive LLM Agents
Authors: Dakuo Wang, Ting-Yao Hsu, Yuxuan Lu, Hansu Gu, Limeng Cui, Yaochen Xie, William Headean, Bingsheng Yao, Akash Veeragouni, Jiapeng Liu, Sreyashi Nag, Jessie Wang,
Abstract summary: A/B testing remains constrained by its dependence on the large-scale and live traffic of human participants.<n>We present AgentA/B, a novel system that automatically simulate user interaction behaviors with real webpages.<n>Our findings suggest AgentA/B can emulate human-like behavior patterns.
Score: 28.20409050985182
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: A/B testing experiment is a widely adopted method for evaluating UI/UX design decisions in modern web applications. Yet, traditional A/B testing remains constrained by its dependence on the large-scale and live traffic of human participants, and the long time of waiting for the testing result. Through formative interviews with six experienced industry practitioners, we identified critical bottlenecks in current A/B testing workflows. In response, we present AgentA/B, a novel system that leverages Large Language Model-based autonomous agents (LLM Agents) to automatically simulate user interaction behaviors with real webpages. AgentA/B enables scalable deployment of LLM agents with diverse personas, each capable of navigating the dynamic webpage and interactively executing multi-step interactions like search, clicking, filtering, and purchasing. In a demonstrative controlled experiment, we employ AgentA/B to simulate a between-subject A/B testing with 1,000 LLM agents Amazon.com, and compare agent behaviors with real human shopping behaviors at a scale. Our findings suggest AgentA/B can emulate human-like behavior patterns.

Related papers

Breaking Single-Tester Limits: Multi-Agent LLMs for Multi-User Feature Testing [22.10696272175415]
We propose MAdroid, a novel multi-agent approach powered by the Large Language Models (LLMs) to automate the multi-user interactive task for app feature testing.<n>Specifically, MAdroid employs two functional types of multi-agents: user agents (Operator) and supervisor agents (Coordinator and Observer)<n>Our evaluation, which included 41 multi-user interactive tasks, demonstrates the effectiveness of our approach, achieving 82.9% of the tasks with 96.8% action similarity.
arXiv Detail & Related papers (2025-06-21T01:38:53Z)
Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction [46.286440953594266]
We propose to scale test-time interaction, an untapped dimension of test-time scaling.<n>We first show that even prompting-based interaction scaling can improve task success on web benchmarks non-trivially.<n>We introduce TTI (Test-Time Interaction), a curriculum-based online reinforcement learning approach that trains agents by adaptively adjusting their rollout lengths.
arXiv Detail & Related papers (2025-06-09T17:50:02Z)
PersonaAgent: When Large Language Model Agents Meet Personalization at Test Time [87.99027488664282]
PersonaAgent is a framework designed to address versatile personalization tasks.<n>It integrates a personalized memory module and a personalized action module.<n>Test-time user-preference alignment strategy ensures real-time user preference alignment.
arXiv Detail & Related papers (2025-06-06T17:29:49Z)
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks [52.46737975742287]
We introduce TheAgentCompany, a benchmark for evaluating AI agents that interact with the world in similar ways to those of a digital worker.<n>We find that the most competitive agent can complete 30% of tasks autonomously.<n>This paints a nuanced picture on task automation with simulating LM agents in a setting a real workplace.
arXiv Detail & Related papers (2024-12-18T18:55:40Z)
LMAgent: A Large-scale Multimodal Agents Society for Multi-user Simulation [66.52371505566815]
Large language models (LLMs)-based AI agents have made significant progress, enabling them to achieve human-like intelligence. We present LMAgent, a very large-scale and multimodal agents society based on multimodal LLMs. In LMAgent, besides chatting with friends, the agents can autonomously browse, purchase, and review products, even perform live streaming e-commerce.
arXiv Detail & Related papers (2024-12-12T12:47:09Z)
Collaborative Instance Object Navigation: Leveraging Uncertainty-Awareness to Minimize Human-Agent Dialogues [54.81155589931697]
Collaborative Instance object Navigation (CoIN) is a new task setting where the agent actively resolve uncertainties about the target instance.<n>We propose a novel training-free method, Agent-user Interaction with UncerTainty Awareness (AIUTA)<n>First, upon object detection, a Self-Questioner model initiates a self-dialogue within the agent to obtain a complete and accurate observation description.<n>An Interaction Trigger module determines whether to ask a question to the human, continue or halt navigation.
arXiv Detail & Related papers (2024-12-02T08:16:38Z)
Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence [79.5316642687565]
Existing multi-agent frameworks often struggle with integrating diverse capable third-party agents. We propose the Internet of Agents (IoA), a novel framework that addresses these limitations. IoA introduces an agent integration protocol, an instant-messaging-like architecture design, and dynamic mechanisms for agent teaming and conversation flow control.
arXiv Detail & Related papers (2024-07-09T17:33:24Z)
Aligning Agents like Large Language Models [8.873319874424167]
Training agents to behave as desired in complex 3D environments from high-dimensional sensory information is challenging. We draw an analogy between the undesirable behaviors of imitation learning agents and the unhelpful responses of unaligned large language models (LLMs) We demonstrate that we can align our agent to consistently perform the desired mode, while providing insights and advice for successfully applying this approach to training agents.
arXiv Detail & Related papers (2024-06-06T16:05:45Z)
Benchmarking Mobile Device Control Agents across Diverse Configurations [19.01954948183538]
B-MoCA is a benchmark for evaluating and developing mobile device control agents. We benchmark diverse agents, including agents employing large language models (LLMs) or multi-modal LLMs. While these agents demonstrate proficiency in executing straightforward tasks, their poor performance on complex tasks highlights significant opportunities for future research to improve effectiveness.
arXiv Detail & Related papers (2024-04-25T14:56:32Z)
On Generative Agents in Recommendation [58.42840923200071]
Agent4Rec is a user simulator in recommendation based on Large Language Models. Each agent interacts with personalized recommender models in a page-by-page manner.
arXiv Detail & Related papers (2023-10-16T06:41:16Z)
AgentCF: Collaborative Learning with Autonomous Language Agents for Recommender Systems [112.76941157194544]
We propose AgentCF for simulating user-item interactions in recommender systems through agent-based collaborative filtering. We creatively consider not only users but also items as agents, and develop a collaborative learning approach that optimize both kinds of agents together. Overall, the optimized agents exhibit diverse interaction behaviors within our framework, including user-item, user-user, item-item, and collective interactions.
arXiv Detail & Related papers (2023-10-13T16:37:14Z)
User Behavior Simulation with Large Language Model based Agents [116.74368915420065]
We propose an LLM-based agent framework and design a sandbox environment to simulate real user behaviors. Based on extensive experiments, we find that the simulated behaviors of our method are very close to the ones of real humans.
arXiv Detail & Related papers (2023-06-05T02:58:35Z)
MUG: Interactive Multimodal Grounding on User Interfaces [12.035123646959669]
We present MUG, a novel interactive task for multimodal grounding where a user and an agent work collaboratively on an interface screen. Prior works modeled multimodal UI grounding in one round: the user gives a command and the agent responds to the command. MUG allows multiple rounds of interactions such that upon seeing the agent responses, the user can give further commands for the agent to refine or even correct its actions.
arXiv Detail & Related papers (2022-09-29T21:08:18Z)
Agents for Automated User Experience Testing [4.6453787256723365]
We propose an agent based approach for automatic UX testing. We develop agents with basic problem solving skills and a core affect model. Although this research is still at a primordial state, we believe the results here make a strong case for the use of intelligent agents.
arXiv Detail & Related papers (2021-04-13T14:13:28Z)
SMARTS: Scalable Multi-Agent Reinforcement Learning Training School for Autonomous Driving [96.50297622371457]
Multi-agent interaction is a fundamental aspect of autonomous driving in the real world. Despite more than a decade of research and development, the problem of how to interact with diverse road users in diverse scenarios remains largely unsolved. We develop a dedicated simulation platform called SMARTS that generates diverse and competent driving interactions.
arXiv Detail & Related papers (2020-10-19T18:26:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.