Related papers: COGNITION: From Evaluation to Defense against Multimodal LLM CAPTCHA Solvers

COGNITION: From Evaluation to Defense against Multimodal LLM CAPTCHA Solvers

URL: http://arxiv.org/abs/2512.02318v2
Date: Wed, 03 Dec 2025 04:01:43 GMT
Title: COGNITION: From Evaluation to Defense against Multimodal LLM CAPTCHA Solvers
Authors: Junyu Wang, Changjia Zhu, Yuanbo Zhou, Lingyao Li, Xu He, Junjie Xiong,
Abstract summary: multimodal large language models (MLLMs) undermine the security guarantees of visual CAPTCHA.<n>We evaluate 7 leading commercial and open-source MLLMs across 18 real-world CAPTCHA task types.<n>We reveal that MLLMs can reliably solve recognition-oriented and low-interaction CAPTCHA tasks at human-like cost and latency.
Score: 17.70082722524941
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper studies how multimodal large language models (MLLMs) undermine the security guarantees of visual CAPTCHA. We identify the attack surface where an adversary can cheaply automate CAPTCHA solving using off-the-shelf models. We evaluate 7 leading commercial and open-source MLLMs across 18 real-world CAPTCHA task types, measuring single-shot accuracy, success under limited retries, end-to-end latency, and per-solve cost. We further analyze the impact of task-specific prompt engineering and few-shot demonstrations on solver effectiveness. We reveal that MLLMs can reliably solve recognition-oriented and low-interaction CAPTCHA tasks at human-like cost and latency, whereas tasks requiring fine-grained localization, multi-step spatial reasoning, or cross-frame consistency remain significantly harder for current models. By examining the reasoning traces of such MLLMs, we investigate the underlying mechanisms of why models succeed/fail on specific CAPTCHA puzzles and use these insights to derive defense-oriented guidelines for selecting and strengthening CAPTCHA tasks. We conclude by discussing implications for platform operators deploying CAPTCHA as part of their abuse-mitigation pipeline.Code Availability (https://anonymous.4open.science/r/Captcha-465E/).

Related papers

MCA-Bench: A Multimodal Benchmark for Evaluating CAPTCHA Robustness Against VLM-based Attacks [13.493337474908316]
MCA-Bench is a comprehensive and reproducible benchmarking suite.<n>It integrates heterogeneous CAPTCHA types into a single evaluation protocol.<n>Extensive experiments reveal that MCA-Bench effectively maps the vulnerability spectrum of modern CAPTCHA designs.
arXiv Detail & Related papers (2025-06-06T11:02:01Z)
Open CaptchaWorld: A Comprehensive Web-based Platform for Testing and Benchmarking Multimodal LLM Agents [23.715342148854006]
Open CaptchaWorld is the first web-based benchmark and platform specifically designed to evaluate the visual reasoning and interaction capabilities of MLLM-powered agents.<n>Results show that humans consistently achieve near-perfect scores, state-of-the-art MLLM agents struggle significantly, with success rates at most 40.0% by Browser-Use Openai-o3.<n>This highlights Open CaptchaWorld as a vital benchmark for diagnosing the limits of current multimodal agents and guiding the development of more robust multimodal reasoning systems.
arXiv Detail & Related papers (2025-05-30T17:59:55Z)
Scaling Autonomous Agents via Automatic Reward Modeling And Planning [52.39395405893965]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of tasks.<n>However, they still struggle with problems requiring multi-step decision-making and environmental feedback.<n>We propose a framework that can automatically learn a reward model from the environment without human annotations.
arXiv Detail & Related papers (2025-02-17T18:49:25Z)
IllusionCAPTCHA: A CAPTCHA based on Visual Illusion [14.043017273813227]
We present IllusionCAPTCHA, a novel security mechanism employing the "Human-Easy but AI-Hard" paradigm.<n>Results from our user study indicate that 86.95% of participants successfully passed the CAPTCHA on their first attempt, outperforming other CAPTCHA systems.
arXiv Detail & Related papers (2025-02-08T06:03:03Z)
Adversarial Reasoning at Jailbreaking Time [49.70772424278124]
Large language models (LLMs) are becoming more capable and widespread.<n>Recent advances in standardizing, measuring, and scaling test-time compute suggest new methodologies for optimizing models to achieve high performance on hard tasks.<n>In this paper, we apply these advances to the task of model jailbreaking: eliciting harmful responses from aligned LLMs.
arXiv Detail & Related papers (2025-02-03T18:59:01Z)
Re-TASK: Revisiting LLM Tasks from Capability, Skill, and Knowledge Perspectives [54.14429346914995]
Chain-of-Thought (CoT) has become a pivotal method for solving complex problems with large language models (LLMs)<n>This paper introduces the Re-TASK framework, a novel theoretical model that revisits LLM tasks from capability, skill, and knowledge perspectives.<n> Experiments across diverse domains demonstrate the effectiveness of Re-TASK.
arXiv Detail & Related papers (2024-08-13T13:58:23Z)
Oedipus: LLM-enchanced Reasoning CAPTCHA Solver [17.074422329618212]
Oedipus is an innovative end-to-end framework for automated reasoning CAPTCHA solving. Central to this framework is a novel strategy that dissects the complex and human-easy-AI-hard tasks into a sequence of simpler and AI-easy steps. Our evaluation shows that Oedipus effectively resolves the studied CAPTCHAs, achieving an average success rate of 63.5%.
arXiv Detail & Related papers (2024-05-13T06:32:57Z)
On the Vulnerability of LLM/VLM-Controlled Robotics [54.57914943017522]
We highlight vulnerabilities in robotic systems integrating large language models (LLMs) and vision-language models (VLMs) due to input modality sensitivities.<n>Our results show that simple input perturbations reduce task execution success rates by 22.2% and 14.6% in two representative LLM/VLM-controlled robotic systems.
arXiv Detail & Related papers (2024-02-15T22:01:45Z)
A Survey of Adversarial CAPTCHAs on its History, Classification and Generation [69.36242543069123]
We extend the definition of adversarial CAPTCHAs and propose a classification method for adversarial CAPTCHAs. Also, we analyze some defense methods that can be used to defend adversarial CAPTCHAs, indicating potential threats to adversarial CAPTCHAs.
arXiv Detail & Related papers (2023-11-22T08:44:58Z)
EnSolver: Uncertainty-Aware Ensemble CAPTCHA Solvers with Theoretical Guarantees [1.9649272351760065]
We propose Enr, a family of solvers that use deep ensemble uncertainty to detect and skip out-of-distribution CAPTCHAs. We prove novel theoretical bounds on the effectiveness of our solvers and demonstrate their use with state-of-the-art CAPTCHA solvers.
arXiv Detail & Related papers (2023-07-27T20:19:11Z)
Robust Text CAPTCHAs Using Adversarial Examples [129.29523847765952]
We propose a user-friendly text-based CAPTCHA generation method named Robust Text CAPTCHA (RTC) At the first stage, the foregrounds and backgrounds are constructed with randomly sampled font and background images. At the second stage, we apply a highly transferable adversarial attack for text CAPTCHAs to better obstruct CAPTCHA solvers.
arXiv Detail & Related papers (2021-01-07T11:03:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.