Related papers: MalURLBench: A Benchmark Evaluating Agents' Vulnerabilities When Processing Web URLs

MalURLBench: A Benchmark Evaluating Agents' Vulnerabilities When Processing Web URLs

URL: http://arxiv.org/abs/2601.18113v2
Date: Fri, 30 Jan 2026 14:10:06 GMT
Title: MalURLBench: A Benchmark Evaluating Agents' Vulnerabilities When Processing Web URLs
Authors: Dezhang Kong, Zhuxi Wu, Shiqi Liu, Zhicheng Tan, Kuichen Lu, Minghao Li, Qichen Liu, Shengyu Chu, Zhenhua Xu, Xuan Liu, Meng Han,
Abstract summary: We propose MalURLBench, the first benchmark for evaluating web agents' vulnerabilities to malicious URLs.<n>MalURLBench contains 61,845 attack instances spanning 10 real-world scenarios and 7 categories of real malicious websites.<n> Experiments with 12 popular LLMs reveal that existing models struggle to detect elaborately disguised malicious URLs.
Score: 16.403811916501077
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: LLM-based web agents have become increasingly popular for their utility in daily life and work. However, they exhibit critical vulnerabilities when processing malicious URLs: accepting a disguised malicious URL enables subsequent access to unsafe webpages, which can cause severe damage to service providers and users. Despite this risk, no benchmark currently targets this emerging threat. To address this gap, we propose MalURLBench, the first benchmark for evaluating LLMs' vulnerabilities to malicious URLs. MalURLBench contains 61,845 attack instances spanning 10 real-world scenarios and 7 categories of real malicious websites. Experiments with 12 popular LLMs reveal that existing models struggle to detect elaborately disguised malicious URLs. We further identify and analyze key factors that impact attack success rates and propose URLGuard, a lightweight defense module. We believe this work will provide a foundational resource for advancing the security of web agents. Our code is available at https://github.com/JiangYingEr/MalURLBench.

Related papers

OmniSafeBench-MM: A Unified Benchmark and Toolbox for Multimodal Jailbreak Attack-Defense Evaluation [94.61617176929384]
OmniSafeBench-MM is a comprehensive toolbox for multi-modal jailbreak attack-defense evaluation.<n>It integrates 13 representative attack methods, 15 defense strategies, and a diverse dataset spanning 9 major risk domains and 50 fine-grained categories.<n>By unifying data, methodology, and evaluation into an open-source, reproducible platform, OmniSafeBench-MM provides a standardized foundation for future research.
arXiv Detail & Related papers (2025-12-06T22:56:29Z)
Friend or Foe: How LLMs' Safety Mind Gets Fooled by Intent Shift Attack [53.34204977366491]
Large language models (LLMs) remain vulnerable to jailbreaking attacks despite their impressive capabilities.<n>In this paper, we introduce ISA (Intent Shift Attack), which obfuscates LLMs about the intent of the attacks.<n>Our approach only needs minimal edits to the original request, and yields natural, human-readable, and seemingly harmless prompts.
arXiv Detail & Related papers (2025-11-01T13:44:42Z)
SafeSearch: Automated Red-Teaming for the Safety of LLM-Based Search Agents [63.70653857721785]
We conduct two in-the-wild experiments to demonstrate the prevalence of low-quality search results and their potential to misguide agent behaviors.<n>To counter this threat, we introduce an automated red-teaming framework that is systematic, scalable, and cost-efficient.
arXiv Detail & Related papers (2025-09-28T07:05:17Z)
Web Fraud Attacks Against LLM-Driven Multi-Agent Systems [16.324314873769215]
Web fraud attacks pose non-negligible threats to system security and user safety.<n>We propose Web Fraud Attacks, a novel type of attack aiming at inducing multi-agent systems to visit malicious websites.
arXiv Detail & Related papers (2025-09-01T07:47:24Z)
Client-Side Zero-Shot LLM Inference for Comprehensive In-Browser URL Analysis [0.0]
Malicious websites and phishing URLs pose an ever-increasing cybersecurity risk.<n>Traditional detection approaches rely on machine learning.<n>We propose a novel client-side framework for comprehensive URL analysis.
arXiv Detail & Related papers (2025-06-04T07:47:23Z)
AgentVigil: Generic Black-Box Red-teaming for Indirect Prompt Injection against LLM Agents [54.29555239363013]
We propose a generic black-box fuzzing framework, AgentVigil, to automatically discover and exploit indirect prompt injection vulnerabilities.<n>We evaluate AgentVigil on two public benchmarks, AgentDojo and VWA-adv, where it achieves 71% and 70% success rates against agents based on o3-mini and GPT-4o.<n>We apply our attacks in real-world environments, successfully misleading agents to navigate to arbitrary URLs, including malicious sites.
arXiv Detail & Related papers (2025-05-09T07:40:17Z)
AutoAdvExBench: Benchmarking autonomous exploitation of adversarial example defenses [66.87883360545361]
AutoAdvExBench is a benchmark to evaluate if large language models (LLMs) can autonomously exploit defenses to adversarial examples.<n>We design a strong agent that is capable of breaking 75% of CTF-like ("homework exercise") adversarial example defenses.<n>We show that this agent is only able to succeed on 13% of the real-world defenses in our benchmark, indicating the large gap between difficulty in attacking "real" code, and CTF-like code.
arXiv Detail & Related papers (2025-03-03T18:39:48Z)
Commercial LLM Agents Are Already Vulnerable to Simple Yet Dangerous Attacks [88.84977282952602]
A high volume of recent ML security literature focuses on attacks against aligned large language models (LLMs)<n>In this paper, we analyze security and privacy vulnerabilities that are unique to LLM agents.<n>We conduct a series of illustrative attacks on popular open-source and commercial agents, demonstrating the immediate practical implications of their vulnerabilities.
arXiv Detail & Related papers (2025-02-12T17:19:36Z)
Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents [32.62654499260479]
We introduce Agent Security Bench (ASB), a framework designed to formalize, benchmark, and evaluate the attacks and defenses of LLM-based agents.<n>Based on ASB, we benchmark 10 prompt injection attacks, a memory poisoning attack, a novel Plan-of-Thought backdoor attack, 4 mixed attacks, and 11 corresponding defenses.<n>Our benchmark results reveal critical vulnerabilities in different stages of agent operation, including system prompt, user prompt handling, tool usage, and memory retrieval.
arXiv Detail & Related papers (2024-10-03T16:30:47Z)
BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks and Defenses on Large Language Models [27.59116619946915]
Generative large language models (LLMs) have achieved state-of-the-art results on a wide range of tasks, yet they remain susceptible to backdoor attacks.<n>BackdoorLLM is the first comprehensive benchmark for systematically evaluating backdoor threats in text-generation LLMs.<n>BackdoorLLM provides: (i) a unified repository of benchmarks with a standardized training and evaluation pipeline; (ii) a diverse suite of attack modalities, including data poisoning, weight poisoning, hidden-state manipulation, and chain-of-thought hijacking; (iii) over 200 experiments spanning 8 distinct attack strategies, 7 real-
arXiv Detail & Related papers (2024-08-23T02:21:21Z)
Can LLMs Patch Security Issues? [1.3299507495084417]
Large Language Models (LLMs) have shown impressive proficiency in code generation. LLMs share a weakness with their human counterparts: producing code that inadvertently has security vulnerabilities. We propose Feedback-Driven Security Patching (FDSP), where LLMs automatically refine generated, vulnerable code.
arXiv Detail & Related papers (2023-11-13T08:54:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.