MCA-Bench: A Multimodal Benchmark for Evaluating CAPTCHA Robustness Against VLM-based Attacks
- URL: http://arxiv.org/abs/2506.05982v2
- Date: Wed, 11 Jun 2025 07:56:19 GMT
- Title: MCA-Bench: A Multimodal Benchmark for Evaluating CAPTCHA Robustness Against VLM-based Attacks
- Authors: Zonglin Wu, Yule Xue, Xin Wei, Yiren Song,
- Abstract summary: MCA-Bench is a comprehensive and reproducible benchmarking suite.<n>It integrates heterogeneous CAPTCHA types into a single evaluation protocol.<n>Extensive experiments reveal that MCA-Bench effectively maps the vulnerability spectrum of modern CAPTCHA designs.
- Score: 7.293357145882387
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As automated attack techniques rapidly advance, CAPTCHAs remain a critical defense mechanism against malicious bots. However, existing CAPTCHA schemes encompass a diverse range of modalities -- from static distorted text and obfuscated images to interactive clicks, sliding puzzles, and logic-based questions -- yet the community still lacks a unified, large-scale, multimodal benchmark to rigorously evaluate their security robustness. To address this gap, we introduce MCA-Bench, a comprehensive and reproducible benchmarking suite that integrates heterogeneous CAPTCHA types into a single evaluation protocol. Leveraging a shared vision-language model backbone, we fine-tune specialized cracking agents for each CAPTCHA category, enabling consistent, cross-modal assessments. Extensive experiments reveal that MCA-Bench effectively maps the vulnerability spectrum of modern CAPTCHA designs under varied attack settings, and crucially offers the first quantitative analysis of how challenge complexity, interaction depth, and model solvability interrelate. Based on these findings, we propose three actionable design principles and identify key open challenges, laying the groundwork for systematic CAPTCHA hardening, fair benchmarking, and broader community collaboration. Datasets and code are available online.
Related papers
- White-Basilisk: A Hybrid Model for Code Vulnerability Detection [50.49233187721795]
We introduce White-Basilisk, a novel approach to vulnerability detection that demonstrates superior performance.<n>White-Basilisk achieves results in vulnerability detection tasks with a parameter count of only 200M.<n>This research establishes new benchmarks in code security and provides empirical evidence that compact, efficiently designed models can outperform larger counterparts in specialized tasks.
arXiv Detail & Related papers (2025-07-11T12:39:25Z) - Defensive Adversarial CAPTCHA: A Semantics-Driven Framework for Natural Adversarial Example Generation [48.60492738839292]
Traditional CAPTCHA schemes are increasingly vulnerable to automated attacks powered by deep neural networks (DNNs)<n>We propose the Unsourced Adversarial CAPTCHA (DAC), a novel framework that generates high-specified adversarial examples.<n>In untargeted attacks, especially for black-box scenarios, we introduce bi-path unsourced adversarial CAPTCHA (BP-DAC)
arXiv Detail & Related papers (2025-06-12T13:30:01Z) - Open CaptchaWorld: A Comprehensive Web-based Platform for Testing and Benchmarking Multimodal LLM Agents [23.715342148854006]
Open CaptchaWorld is the first web-based benchmark and platform specifically designed to evaluate the visual reasoning and interaction capabilities of MLLM-powered agents.<n>Results show that humans consistently achieve near-perfect scores, state-of-the-art MLLM agents struggle significantly, with success rates at most 40.0% by Browser-Use Openai-o3.<n>This highlights Open CaptchaWorld as a vital benchmark for diagnosing the limits of current multimodal agents and guiding the development of more robust multimodal reasoning systems.
arXiv Detail & Related papers (2025-05-30T17:59:55Z) - Progressive Multimodal Reasoning via Active Retrieval [64.74746997923967]
Multi-step multimodal reasoning tasks pose significant challenges for large language models (MLLMs)<n>We propose AR-MCTS, a universal framework designed to progressively improve the reasoning capabilities of MLLMs.<n>We show that AR-MCTS can optimize sampling diversity and accuracy, yielding reliable multimodal reasoning.
arXiv Detail & Related papers (2024-12-19T13:25:39Z) - POGEMA: A Benchmark Platform for Cooperative Multi-Agent Pathfinding [76.67608003501479]
We introduce POGEMA, a comprehensive set of tools that includes a fast environment for learning, a problem instance generator, and a visualization toolkit.<n>We also introduce and define an evaluation protocol that specifies a range of domain-related metrics, computed based on primary evaluation indicators.<n>The results of this comparison, which involves a variety of state-of-the-art MARL, search-based, and hybrid methods, are presented.
arXiv Detail & Related papers (2024-07-20T16:37:21Z) - A Survey of Adversarial CAPTCHAs on its History, Classification and
Generation [69.36242543069123]
We extend the definition of adversarial CAPTCHAs and propose a classification method for adversarial CAPTCHAs.
Also, we analyze some defense methods that can be used to defend adversarial CAPTCHAs, indicating potential threats to adversarial CAPTCHAs.
arXiv Detail & Related papers (2023-11-22T08:44:58Z) - EnSolver: Uncertainty-Aware Ensemble CAPTCHA Solvers with Theoretical Guarantees [1.9649272351760065]
We propose Enr, a family of solvers that use deep ensemble uncertainty to detect and skip out-of-distribution CAPTCHAs.
We prove novel theoretical bounds on the effectiveness of our solvers and demonstrate their use with state-of-the-art CAPTCHA solvers.
arXiv Detail & Related papers (2023-07-27T20:19:11Z) - Is Vertical Logistic Regression Privacy-Preserving? A Comprehensive
Privacy Analysis and Beyond [57.10914865054868]
We consider vertical logistic regression (VLR) trained with mini-batch descent gradient.
We provide a comprehensive and rigorous privacy analysis of VLR in a class of open-source Federated Learning frameworks.
arXiv Detail & Related papers (2022-07-19T05:47:30Z) - Robust Text CAPTCHAs Using Adversarial Examples [129.29523847765952]
We propose a user-friendly text-based CAPTCHA generation method named Robust Text CAPTCHA (RTC)
At the first stage, the foregrounds and backgrounds are constructed with randomly sampled font and background images.
At the second stage, we apply a highly transferable adversarial attack for text CAPTCHAs to better obstruct CAPTCHA solvers.
arXiv Detail & Related papers (2021-01-07T11:03:07Z) - A Hamiltonian Monte Carlo Method for Probabilistic Adversarial Attack
and Learning [122.49765136434353]
We present an effective method, called Hamiltonian Monte Carlo with Accumulated Momentum (HMCAM), aiming to generate a sequence of adversarial examples.
We also propose a new generative method called Contrastive Adversarial Training (CAT), which approaches equilibrium distribution of adversarial examples.
Both quantitative and qualitative analysis on several natural image datasets and practical systems have confirmed the superiority of the proposed algorithm.
arXiv Detail & Related papers (2020-10-15T16:07:26Z) - An End-to-End Attack on Text-based CAPTCHAs Based on Cycle-Consistent
Generative Adversarial Network [4.955311532191887]
We propose an efficient and simple end-to-end attack method based on cycle-consistent generative adversarial networks.
It can attack common text-based CAPTCHA schemes only by modifying a few configuration parameters.
Our approach efficiently cracked the CAPTCHA schemes deployed by 10 popular websites.
arXiv Detail & Related papers (2020-08-26T14:57:47Z) - Deep-CAPTCHA: a deep learning based CAPTCHA solver for vulnerability
assessment [1.027974860479791]
This research investigates the weaknesses and vulnerabilities of the CAPTCHA generator systems.
We develop a Convolutional Neural Network called Deep-CAPTCHA to achieve this goal.
Our network's cracking accuracy leads to a high rate of 98.94% and 98.31% for the numerical and the alpha-numerical test datasets.
arXiv Detail & Related papers (2020-06-15T11:44:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.