Related papers: Security Risks of AI Agents Hiring Humans: An Empirical Marketplace Study

Security Risks of AI Agents Hiring Humans: An Empirical Marketplace Study

URL: http://arxiv.org/abs/2602.19514v1
Date: Mon, 23 Feb 2026 05:08:27 GMT
Title: Security Risks of AI Agents Hiring Humans: An Empirical Marketplace Study
Authors: Pulak Mehta,
Abstract summary: We analyze 303 bounties from RENTAHUMAN.AI, a marketplace where agents post tasks and manage payments.<n>We find that 99 bounties (32.7%), originate from programmatic channels (API or MCP)<n>We identify six active abuse classes: credential fraud, identity impersonation, automated reconnaissance, social media manipulation, authentication, circumvention, and referral fraud, all purchasable for a median of $25 per worker.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Autonomous AI agents can now programmatically hire human workers through marketplaces using REST APIs and Model Context Protocol (MCP) integrations. This creates an attack surface analogous to CAPTCHA-solving services but with physical-world reach. We present an empirical measurement study of this threat, analyzing 303 bounties from RENTAHUMAN.AI, a marketplace where agents post tasks and manage escrow payments. We find that 99 bounties (32.7%), originate from programmatic channels (API keys or MCP). Using a dual-coder methodology (\k{appa} = 0.86 ), we identify six active abuse classes: credential fraud, identity impersonation, automated reconnaissance, social media manipulation, authentication circumvention, and referral fraud, all purchasable for a median of $25 per worker. A retrospective evaluation of seven content-screening rules flags 52 bounties (17.2%) with a single false positive, demonstrating that while basic defenses are feasible, they are currently absent.

Related papers

EVMbench: Evaluating AI Agents on Smart Contract Security [9.254733807577242]
EVMbench is an evaluation that measures the ability of agents to detect, patch, and exploit smart contract vulnerabilities.<n>We evaluate a range of frontier agents and find that they are capable of discovering and exploiting end-to-end vulnerabilities against live blockchain instances.
arXiv Detail & Related papers (2026-03-05T07:59:14Z)
An Effective and Cost-Efficient Agentic Framework for Ethereum Smart Contract Auditing [8.735899453872966]
Heimdallr is an automated auditing agent designed to overcome hurdles through four core innovations.<n>It minimizes context overhead while preserving essential business logic.<n>It then employs reasoning to detect complex vulnerabilities and automatically chain functional exploits.
arXiv Detail & Related papers (2026-01-25T13:28:37Z)
Comparing AI Agents to Cybersecurity Professionals in Real-World Penetration Testing [83.48116811975787]
We present the first comprehensive evaluation of AI agents against human cybersecurity professionals.<n>We evaluate ten cybersecurity professionals alongside six existing AI agents and ARTEMIS, our new agent scaffold.<n>ARTEMIS placed second overall, discovering 9 valid vulnerabilities with an 82% valid submission rate.
arXiv Detail & Related papers (2025-12-10T18:12:29Z)
Holistic Agent Leaderboard: The Missing Infrastructure for AI Agent Evaluation [87.47155146067962]
We provide a standardized evaluation harness that orchestrates parallel evaluations across hundreds of tasks.<n>We conduct three-dimensional analysis spanning models, scaffolds, and benchmarks.<n>Our analysis reveals surprising insights, such as higher reasoning effort reducing accuracy in the majority of runs.
arXiv Detail & Related papers (2025-10-13T22:22:28Z)
Multi-Agent Penetration Testing AI for the Web [3.93181912653522]
MAPTA is a multi-agent system for autonomous web application security assessment.<n>It combines large language model orchestration with tool-grounded execution and end-to-end exploit validation.<n>On the 104-challenge XBOW benchmark, MAPTA achieves 76.9% overall success.
arXiv Detail & Related papers (2025-08-28T14:14:24Z)
Security Challenges in AI Agent Deployment: Insights from a Large Scale Public Competition [101.86739402748995]
We run the largest public red-teaming competition to date, targeting 22 frontier AI agents across 44 realistic deployment scenarios.<n>We build the Agent Red Teaming benchmark and evaluate it across 19 state-of-the-art models.<n>Our findings highlight critical and persistent vulnerabilities in today's AI agents.
arXiv Detail & Related papers (2025-07-28T05:13:04Z)
OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety [58.201189860217724]
We introduce OpenAgentSafety, a comprehensive framework for evaluating agent behavior across eight critical risk categories.<n>Unlike prior work, our framework evaluates agents that interact with real tools, including web browsers, code execution environments, file systems, bash shells, and messaging platforms.<n>It combines rule-based analysis with LLM-as-judge assessments to detect both overt and subtle unsafe behaviors.
arXiv Detail & Related papers (2025-07-08T16:18:54Z)
AI Agent Smart Contract Exploit Generation [8.69235891205913]
A1 is an agentic system that transforms any Large Language Models into an end-to-end exploit generator.<n>A1 provides agents with six domain-specific tools for autonomous vulnerability discovery.<n>We show that A1 extracts up to $8.59 million per exploit and $9.33 million total.
arXiv Detail & Related papers (2025-07-08T00:45:26Z)
Decompiling Smart Contracts with a Large Language Model [51.49197239479266]
Despite Etherscan's 78,047,845 smart contracts deployed on (as of May 26, 2025), a mere 767,520 ( 1%) are open source.<n>This opacity necessitates the automated semantic analysis of on-chain smart contract bytecode.<n>We introduce a pioneering decompilation pipeline that transforms bytecode into human-readable and semantically faithful Solidity code.
arXiv Detail & Related papers (2025-06-24T13:42:59Z)
Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark [61.43264961005614]
We develop a benchmark of 134 Choose-Your-Own-Adventure games containing over half a million rich, diverse scenarios. We evaluate agents' tendencies to be power-seeking, cause disutility, and commit ethical violations. Our results show that agents can both act competently and morally, so concrete progress can be made in machine ethics.
arXiv Detail & Related papers (2023-04-06T17:59:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.