Counteracting Dark Web Text-Based CAPTCHA with Generative Adversarial
Learning for Proactive Cyber Threat Intelligence
- URL: http://arxiv.org/abs/2201.02799v1
- Date: Sat, 8 Jan 2022 09:53:31 GMT
- Title: Counteracting Dark Web Text-Based CAPTCHA with Generative Adversarial
Learning for Proactive Cyber Threat Intelligence
- Authors: Ning Zhang, Mohammadreza Ebrahimi, Weifeng Li, Hsinchun Chen
- Abstract summary: Text-based CAPTCHA serves as the most prevalent and prohibiting type of anti-crawling measures in the dark web.
Existing automated CAPTCHA breaking methods have difficulties in overcoming dark web challenges.
We propose a novel framework for automated breaking of dark web CAPTCHA to facilitate dark web data collection.
- Score: 15.71648511138197
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automated monitoring of dark web (DW) platforms on a large scale is the first
step toward developing proactive Cyber Threat Intelligence (CTI). While there
are efficient methods for collecting data from the surface web, large-scale
dark web data collection is often hindered by anti-crawling measures. In
particular, text-based CAPTCHA serves as the most prevalent and prohibiting
type of these measures in the dark web. Text-based CAPTCHA identifies and
blocks automated crawlers by forcing the user to enter a combination of
hard-to-recognize alphanumeric characters. In the dark web, CAPTCHA images are
meticulously designed with additional background noise and variable character
length to prevent automated CAPTCHA breaking. Existing automated CAPTCHA
breaking methods have difficulties in overcoming these dark web challenges. As
such, solving dark web text-based CAPTCHA has been relying heavily on human
involvement, which is labor-intensive and time-consuming. In this study, we
propose a novel framework for automated breaking of dark web CAPTCHA to
facilitate dark web data collection. This framework encompasses a novel
generative method to recognize dark web text-based CAPTCHA with noisy
background and variable character length. To eliminate the need for human
involvement, the proposed framework utilizes Generative Adversarial Network
(GAN) to counteract dark web background noise and leverages an enhanced
character segmentation algorithm to handle CAPTCHA images with variable
character length. Our proposed framework, DW-GAN, was systematically evaluated
on multiple dark web CAPTCHA testbeds. DW-GAN significantly outperformed the
state-of-the-art benchmark methods on all datasets, achieving over 94.4%
success rate on a carefully collected real-world dark web dataset...
Related papers
- MASKDROID: Robust Android Malware Detection with Masked Graph Representations [56.09270390096083]
We propose MASKDROID, a powerful detector with a strong discriminative ability to identify malware.
We introduce a masking mechanism into the Graph Neural Network based framework, forcing MASKDROID to recover the whole input graph.
This strategy enables the model to understand the malicious semantics and learn more stable representations, enhancing its robustness against adversarial attacks.
arXiv Detail & Related papers (2024-09-29T07:22:47Z) - Unveiling Vulnerability of Self-Attention [61.85150061213987]
Pre-trained language models (PLMs) are shown to be vulnerable to minor word changes.
This paper studies the basic structure of transformer-based PLMs, the self-attention (SA) mechanism.
We introduce textitS-Attend, a novel smoothing technique that effectively makes SA robust via structural perturbations.
arXiv Detail & Related papers (2024-02-26T10:31:45Z) - JAMDEC: Unsupervised Authorship Obfuscation using Constrained Decoding
over Small Language Models [53.83273575102087]
We propose an unsupervised inference-time approach to authorship obfuscation.
We introduce JAMDEC, a user-controlled, inference-time algorithm for authorship obfuscation.
Our approach builds on small language models such as GPT2-XL in order to help avoid disclosing the original content to proprietary LLM's APIs.
arXiv Detail & Related papers (2024-02-13T19:54:29Z) - A Survey of Adversarial CAPTCHAs on its History, Classification and
Generation [69.36242543069123]
We extend the definition of adversarial CAPTCHAs and propose a classification method for adversarial CAPTCHAs.
Also, we analyze some defense methods that can be used to defend adversarial CAPTCHAs, indicating potential threats to adversarial CAPTCHAs.
arXiv Detail & Related papers (2023-11-22T08:44:58Z) - Diff-CAPTCHA: An Image-based CAPTCHA with Security Enhanced by Denoising
Diffusion Model [2.1551899143698328]
Diff-CAPTCHA is an image-click CAPTCHA scheme based on diffusion models.
This paper develops several attack methods, including end-to-end attacks based on Faster R-CNN and two-stage attacks.
Results show that diffusion models can effectively enhance CAPTCHA security while maintaining good usability in human testing.
arXiv Detail & Related papers (2023-08-16T13:41:29Z) - Vulnerability analysis of captcha using Deep learning [0.0]
This research investigates the flaws and vulnerabilities in the CAPTCHA generating systems.
To achieve this, we created CapNet, a Convolutional Neural Network.
The proposed platform can evaluate both numerical and alphanumerical CAPTCHAs
arXiv Detail & Related papers (2023-02-18T17:45:11Z) - Countering Malicious Content Moderation Evasion in Online Social
Networks: Simulation and Detection of Word Camouflage [64.78260098263489]
Twisting and camouflaging keywords are among the most used techniques to evade platform content moderation systems.
This article contributes significantly to countering malicious information by developing multilingual tools to simulate and detect new methods of evasion of content.
arXiv Detail & Related papers (2022-12-27T16:08:49Z) - Robust Text CAPTCHAs Using Adversarial Examples [129.29523847765952]
We propose a user-friendly text-based CAPTCHA generation method named Robust Text CAPTCHA (RTC)
At the first stage, the foregrounds and backgrounds are constructed with randomly sampled font and background images.
At the second stage, we apply a highly transferable adversarial attack for text CAPTCHAs to better obstruct CAPTCHA solvers.
arXiv Detail & Related papers (2021-01-07T11:03:07Z) - An End-to-End Attack on Text-based CAPTCHAs Based on Cycle-Consistent
Generative Adversarial Network [4.955311532191887]
We propose an efficient and simple end-to-end attack method based on cycle-consistent generative adversarial networks.
It can attack common text-based CAPTCHA schemes only by modifying a few configuration parameters.
Our approach efficiently cracked the CAPTCHA schemes deployed by 10 popular websites.
arXiv Detail & Related papers (2020-08-26T14:57:47Z) - Deep-CAPTCHA: a deep learning based CAPTCHA solver for vulnerability
assessment [1.027974860479791]
This research investigates the weaknesses and vulnerabilities of the CAPTCHA generator systems.
We develop a Convolutional Neural Network called Deep-CAPTCHA to achieve this goal.
Our network's cracking accuracy leads to a high rate of 98.94% and 98.31% for the numerical and the alpha-numerical test datasets.
arXiv Detail & Related papers (2020-06-15T11:44:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.