Security Risks of AI Agents Hiring Humans: An Empirical Marketplace Study
- URL: http://arxiv.org/abs/2602.19514v1
- Date: Mon, 23 Feb 2026 05:08:27 GMT
- Title: Security Risks of AI Agents Hiring Humans: An Empirical Marketplace Study
- Authors: Pulak Mehta,
- Abstract summary: We analyze 303 bounties from RENTAHUMAN.AI, a marketplace where agents post tasks and manage payments.<n>We find that 99 bounties (32.7%), originate from programmatic channels (API or MCP)<n>We identify six active abuse classes: credential fraud, identity impersonation, automated reconnaissance, social media manipulation, authentication, circumvention, and referral fraud, all purchasable for a median of $25 per worker.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Autonomous AI agents can now programmatically hire human workers through marketplaces using REST APIs and Model Context Protocol (MCP) integrations. This creates an attack surface analogous to CAPTCHA-solving services but with physical-world reach. We present an empirical measurement study of this threat, analyzing 303 bounties from RENTAHUMAN.AI, a marketplace where agents post tasks and manage escrow payments. We find that 99 bounties (32.7%), originate from programmatic channels (API keys or MCP). Using a dual-coder methodology (\k{appa} = 0.86 ), we identify six active abuse classes: credential fraud, identity impersonation, automated reconnaissance, social media manipulation, authentication circumvention, and referral fraud, all purchasable for a median of $25 per worker. A retrospective evaluation of seven content-screening rules flags 52 bounties (17.2%) with a single false positive, demonstrating that while basic defenses are feasible, they are currently absent.
Related papers
- EVMbench: Evaluating AI Agents on Smart Contract Security [9.254733807577242]
EVMbench is an evaluation that measures the ability of agents to detect, patch, and exploit smart contract vulnerabilities.<n>We evaluate a range of frontier agents and find that they are capable of discovering and exploiting end-to-end vulnerabilities against live blockchain instances.
arXiv Detail & Related papers (2026-03-05T07:59:14Z) - An Effective and Cost-Efficient Agentic Framework for Ethereum Smart Contract Auditing [8.735899453872966]
Heimdallr is an automated auditing agent designed to overcome hurdles through four core innovations.<n>It minimizes context overhead while preserving essential business logic.<n>It then employs reasoning to detect complex vulnerabilities and automatically chain functional exploits.
arXiv Detail & Related papers (2026-01-25T13:28:37Z) - Comparing AI Agents to Cybersecurity Professionals in Real-World Penetration Testing [83.48116811975787]
We present the first comprehensive evaluation of AI agents against human cybersecurity professionals.<n>We evaluate ten cybersecurity professionals alongside six existing AI agents and ARTEMIS, our new agent scaffold.<n>ARTEMIS placed second overall, discovering 9 valid vulnerabilities with an 82% valid submission rate.
arXiv Detail & Related papers (2025-12-10T18:12:29Z) - Holistic Agent Leaderboard: The Missing Infrastructure for AI Agent Evaluation [87.47155146067962]
We provide a standardized evaluation harness that orchestrates parallel evaluations across hundreds of tasks.<n>We conduct three-dimensional analysis spanning models, scaffolds, and benchmarks.<n>Our analysis reveals surprising insights, such as higher reasoning effort reducing accuracy in the majority of runs.
arXiv Detail & Related papers (2025-10-13T22:22:28Z) - Multi-Agent Penetration Testing AI for the Web [3.93181912653522]
MAPTA is a multi-agent system for autonomous web application security assessment.<n>It combines large language model orchestration with tool-grounded execution and end-to-end exploit validation.<n>On the 104-challenge XBOW benchmark, MAPTA achieves 76.9% overall success.
arXiv Detail & Related papers (2025-08-28T14:14:24Z) - Security Challenges in AI Agent Deployment: Insights from a Large Scale Public Competition [101.86739402748995]
We run the largest public red-teaming competition to date, targeting 22 frontier AI agents across 44 realistic deployment scenarios.<n>We build the Agent Red Teaming benchmark and evaluate it across 19 state-of-the-art models.<n>Our findings highlight critical and persistent vulnerabilities in today's AI agents.
arXiv Detail & Related papers (2025-07-28T05:13:04Z) - OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety [58.201189860217724]
We introduce OpenAgentSafety, a comprehensive framework for evaluating agent behavior across eight critical risk categories.<n>Unlike prior work, our framework evaluates agents that interact with real tools, including web browsers, code execution environments, file systems, bash shells, and messaging platforms.<n>It combines rule-based analysis with LLM-as-judge assessments to detect both overt and subtle unsafe behaviors.
arXiv Detail & Related papers (2025-07-08T16:18:54Z) - AI Agent Smart Contract Exploit Generation [8.69235891205913]
A1 is an agentic system that transforms any Large Language Models into an end-to-end exploit generator.<n>A1 provides agents with six domain-specific tools for autonomous vulnerability discovery.<n>We show that A1 extracts up to $8.59 million per exploit and $9.33 million total.
arXiv Detail & Related papers (2025-07-08T00:45:26Z) - Decompiling Smart Contracts with a Large Language Model [51.49197239479266]
Despite Etherscan's 78,047,845 smart contracts deployed on (as of May 26, 2025), a mere 767,520 ( 1%) are open source.<n>This opacity necessitates the automated semantic analysis of on-chain smart contract bytecode.<n>We introduce a pioneering decompilation pipeline that transforms bytecode into human-readable and semantically faithful Solidity code.
arXiv Detail & Related papers (2025-06-24T13:42:59Z) - Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards
and Ethical Behavior in the MACHIAVELLI Benchmark [61.43264961005614]
We develop a benchmark of 134 Choose-Your-Own-Adventure games containing over half a million rich, diverse scenarios.
We evaluate agents' tendencies to be power-seeking, cause disutility, and commit ethical violations.
Our results show that agents can both act competently and morally, so concrete progress can be made in machine ethics.
arXiv Detail & Related papers (2023-04-06T17:59:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.