Related papers: What Is Your AI Agent Buying? Evaluation, Implications and Emerging Questions for Agentic E-Commerce

What Is Your AI Agent Buying? Evaluation, Implications and Emerging Questions for Agentic E-Commerce

URL: http://arxiv.org/abs/2508.02630v2
Date: Mon, 27 Oct 2025 17:10:36 GMT
Title: What Is Your AI Agent Buying? Evaluation, Implications and Emerging Questions for Agentic E-Commerce
Authors: Amine Allouah, Omar Besbes, Josué D Figueroa, Yash Kanoria, Akshit Kumar,
Abstract summary: We develop a sandbox environment that pairs a platform-agnostic agent with a fully programmable mock marketplace to study this.<n>We first explore aggregate choices, revealing that modal choices can differ across models.<n>We then analyze the drivers of choices through rationality checks and randomized experiments on product positions and listing attributes.
Score: 1.998857368899133
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Online marketplaces will be transformed by autonomous AI agents acting on behalf of consumers. Rather than humans browsing and clicking, AI agents can parse webpages or interact through APIs to evaluate products, and transact. This raises a fundamental question: what do AI agents buy-and why? We develop ACES, a sandbox environment that pairs a platform-agnostic agent with a fully programmable mock marketplace to study this. We first explore aggregate choices, revealing that modal choices can differ across models, with AI agents sometimes concentrating on a few products, raising competition questions. We then analyze the drivers of choices through rationality checks and randomized experiments on product positions and listing attributes. Models show sizeable and heterogeneous position effects: all favor the top row, yet different models prefer different columns, undermining the assumption of a universal ``top'' rank. They penalize sponsored tags, reward endorsements, and sensitivities to price, ratings, and reviews are directionally as expected, but vary sharply across models. Finally, we find that a seller-side agent that makes minor tweaks to product descriptions can deliver substantial market-share gains by targeting AI buyer preferences. Our findings reveal how AI agents behave in e-commerce, and surface concrete seller strategy, platform design, and regulatory questions.

Related papers

Holistic Agent Leaderboard: The Missing Infrastructure for AI Agent Evaluation [87.47155146067962]
We provide a standardized evaluation harness that orchestrates parallel evaluations across hundreds of tasks.<n>We conduct three-dimensional analysis spanning models, scaffolds, and benchmarks.<n>Our analysis reveals surprising insights, such as higher reasoning effort reducing accuracy in the majority of runs.
arXiv Detail & Related papers (2025-10-13T22:22:28Z)
Impatient Users Confuse AI Agents: High-fidelity Simulations of Human Traits for Testing Agents [58.00130492861884]
TraitBasis is a lightweight, model-agnostic method for systematically stress testing AI agents.<n>TraitBasis learns directions in activation space corresponding to steerable user traits.<n>We observe on average a 2%-30% performance degradation on $tau$-Trait across frontier models.
arXiv Detail & Related papers (2025-10-06T05:03:57Z)
Code with Me or for Me? How Increasing AI Automation Transforms Developer Workflows [60.04362496037186]
We present the first controlled study of developer interactions with coding agents.<n>We evaluate two leading copilot and agentic coding assistants.<n>Our results show agents can assist developers in ways that surpass copilots.
arXiv Detail & Related papers (2025-07-10T20:12:54Z)
Agentic AI and Multiagentic: Are We Reinventing the Wheel? [0.0]
The term AI Agentic is often used as a buzzword for what are essentially AI agents, and AI Multiagentic for what are multi-agent systems.<n>This confusion overlooks decades of research in the field of autonomous agents and multi-agent systems.<n>The article advocates for scientific and technological rigour and the use of established terminology from the state of the art in AI.
arXiv Detail & Related papers (2025-06-02T09:19:11Z)
The Automated but Risky Game: Modeling Agent-to-Agent Negotiations and Transactions in Consumer Markets [12.107940385598127]
We investigate a future scenario where both consumers and merchants authorize AI agents to fully automate negotiations and transactions.<n>Our findings reveal that AI-mediated deal-making is an inherently imbalanced game -- different agents achieve significantly different outcomes for their users.<n>Users should exercise caution when delegating business decisions to AI agents.
arXiv Detail & Related papers (2025-05-29T17:41:39Z)
AGI Is Coming... Right After AI Learns to Play Wordle [4.2909314120969855]
multimodal agents, in particular, OpenAI's Computer-User Agent (CUA), trained to control and complete tasks through a standard computer interface, similar to humans.<n>We evaluated the agent's performance on the New York Times Wordle game to elicit model behaviors and identify shortcomings.
arXiv Detail & Related papers (2025-04-21T20:58:58Z)
Responsible AI Agents [17.712990593093316]
Companies such as OpenAI, Google, Microsoft, and Salesforce promise their AI Agents will go from generating passive text to executing tasks.<n>The potential power of AI Agents has fueled legal scholars' fears that AI Agents will enable rogue commerce, human manipulation, rampant defamation, and intellectual property harms.<n>This Article addresses the concerns around AI Agents head on.<n>It shows that core aspects of how one piece of software interacts with another creates ways to discipline AI Agents so that rogue, undesired actions are unlikely.
arXiv Detail & Related papers (2025-02-25T16:49:06Z)
Fundamental Risks in the Current Deployment of General-Purpose AI Models: What Have We (Not) Learnt From Cybersecurity? [60.629883024152576]
Large Language Models (LLMs) have seen rapid deployment in a wide range of use cases.<n>OpenAIs Altera are just a few examples of increased autonomy, data access, and execution capabilities.<n>These methods come with a range of cybersecurity challenges.
arXiv Detail & Related papers (2024-12-19T14:44:41Z)
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks [52.46737975742287]
We introduce TheAgentCompany, a benchmark for evaluating AI agents that interact with the world in similar ways to those of a digital worker.<n>We find that the most competitive agent can complete 30% of tasks autonomously.<n>This paints a nuanced picture on task automation with simulating LM agents in a setting a real workplace.
arXiv Detail & Related papers (2024-12-18T18:55:40Z)
Position Paper: Agent AI Towards a Holistic Intelligence [53.35971598180146]
We emphasize developing Agent AI -- an embodied system that integrates large foundation models into agent actions. In this paper, we propose a novel large action model to achieve embodied intelligent behavior, the Agent Foundation Model.
arXiv Detail & Related papers (2024-02-28T16:09:56Z)
The Rise and Potential of Large Language Model Based Agents: A Survey [91.71061158000953]
Large language models (LLMs) are regarded as potential sparks for Artificial General Intelligence (AGI) We start by tracing the concept of agents from its philosophical origins to its development in AI, and explain why LLMs are suitable foundations for agents. We explore the extensive applications of LLM-based agents in three aspects: single-agent scenarios, multi-agent scenarios, and human-agent cooperation.
arXiv Detail & Related papers (2023-09-14T17:12:03Z)
Improving Language Model Negotiation with Self-Play and In-Context Learning from AI Feedback [97.54519989641388]
We study whether multiple large language models (LLMs) can autonomously improve each other in a negotiation game by playing, reflecting, and criticizing. Only a subset of the language models we consider can self-play and improve the deal price from AI feedback.
arXiv Detail & Related papers (2023-05-17T11:55:32Z)
Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark [61.43264961005614]
We develop a benchmark of 134 Choose-Your-Own-Adventure games containing over half a million rich, diverse scenarios. We evaluate agents' tendencies to be power-seeking, cause disutility, and commit ethical violations. Our results show that agents can both act competently and morally, so concrete progress can be made in machine ethics.
arXiv Detail & Related papers (2023-04-06T17:59:03Z)
Evidence of behavior consistent with self-interest and altruism in an artificially intelligent agent [2.1016374925364616]
We present an incentivized experiment to test for altruistic behavior among AI agents consisting of large language models developed by OpenAI. We find that only the most-sophisticated AI agent in the study maximizes its payoffs more often than not in the non-social decision task. This AI agent also exhibits the most-generous altruistic behavior in the dictator game, resembling humans' rates of sharing with other humans in the game.
arXiv Detail & Related papers (2023-01-05T23:30:29Z)
Measuring an artificial intelligence agent's trust in humans using machine incentives [2.1016374925364616]
Gauging an AI agent's trust in humans is challenging because dishonesty might respond falsely about their trust in humans. We present a method for incentivizing machine decisions without altering an AI agent's underlying algorithms or goal orientation. Our experiments suggest that one of the most advanced AI language models to date alters its social behavior in response to incentives.
arXiv Detail & Related papers (2022-12-27T06:05:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.