Related papers: CRATOR: a Dark Web Crawler

CRATOR: a Dark Web Crawler

URL: http://arxiv.org/abs/2405.06356v1
Date: Fri, 10 May 2024 09:39:12 GMT
Title: CRATOR: a Dark Web Crawler
Authors: Daniel De Pascale, Giuseppe Cascavilla, Damian A. Tamburri, Willem-Jan Van Den Heuvel,
Abstract summary: This study proposes a general dark web crawler designed to extract pages handling security protocols, such as captchas. Our approach uses a combination of seed URL lists, link analysis, and scanning to discover new content.
Score: 1.7224362150588657
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Dark web crawling is a complex process that involves specific methodologies and techniques to navigate the Tor network and extract data from hidden services. This study proposes a general dark web crawler designed to extract pages handling security protocols, such as captchas, efficiently. Our approach uses a combination of seed URL lists, link analysis, and scanning to discover new content. We also incorporate methods for user-agent rotation and proxy usage to maintain anonymity and avoid detection. We evaluate the effectiveness of our crawler using metrics such as coverage, performance and robustness. Our results demonstrate that our crawler effectively extracts pages handling security protocols while maintaining anonymity and avoiding detection. Our proposed dark web crawler can be used for various applications, including threat intelligence, cybersecurity, and online investigations.

Related papers

Snorkeling in dark waters: A longitudinal surface exploration of unique Tor Hidden Services (Extended Version) [2.498836880652668]
The Onion Router (Tor) is a controversial network whose utility is constantly under scrutiny. In this work, we present a large-scale analysis of the Tor Network. We leverage our crawler, dubbed Mimir, which automatically collects and visits content linked within the pages to collect a dataset of pages from more than 25k sites.
arXiv Detail & Related papers (2025-04-23T15:59:16Z)
Enhancing Web Agents with Explicit Rollback Mechanisms [55.276852838877346]
We enhance web agents with an explicit rollback mechanism, enabling the agent to revert back to a previous state in its navigation trajectory. This mechanism gives the model the flexibility to directly control the search process, leading to an effective and efficient web navigation method.
arXiv Detail & Related papers (2025-04-16T05:41:20Z)
PrivAgent: Agentic-based Red-teaming for LLM Privacy Leakage [78.33839735526769]
LLMs may be fooled into outputting private information under carefully crafted adversarial prompts. PrivAgent is a novel black-box red-teaming framework for privacy leakage.
arXiv Detail & Related papers (2024-12-07T20:09:01Z)
Unveiling the Digital Fingerprints: Analysis of Internet attacks based on website fingerprints [0.0]
We show that using the newest machine learning algorithms an attacker can deanonymize Tor traffic by applying such techniques. We capture network packets across 11 days, while users navigate specific web pages, recording data in.pcapng format through the Wireshark network capture tool.
arXiv Detail & Related papers (2024-09-01T18:44:40Z)
Dissecting Adversarial Robustness of Multimodal LM Agents [70.2077308846307]
We manually create 200 targeted adversarial tasks and evaluation scripts in a realistic threat model on top of VisualWebArena. We find that we can successfully break latest agents that use black-box frontier LMs, including those that perform reflection and tree search. We also use ARE to rigorously evaluate how the robustness changes as new components are added.
arXiv Detail & Related papers (2024-06-18T17:32:48Z)
AutoScraper: A Progressive Understanding Web Agent for Web Scraper Generation [54.17246674188208]
Web scraping is a powerful technique that extracts data from websites, enabling automated data collection, enhancing data analysis capabilities, and minimizing manual data entry efforts. Existing methods, wrappers-based methods suffer from limited adaptability and scalability when faced with a new website. We introduce the paradigm of generating web scrapers with large language models (LLMs) and propose AutoScraper, a two-stage framework that can handle diverse and changing web environments more efficiently.
arXiv Detail & Related papers (2024-04-19T09:59:44Z)
Darknet Traffic Analysis A Systematic Literature Review [0.0]
The objective of an anonymity tool is to protect the anonymity of its users through the implementation of strong encryption and obfuscation techniques. The strong anonymity feature also functions as a refuge for those involved in illicit activities who aim to avoid being traced on the network. This paper presents a comprehensive analysis of methods of darknet traffic using machine learning techniques to monitor and identify the traffic attacks inside the darknet.
arXiv Detail & Related papers (2023-11-27T19:27:50Z)
PURL: Safe and Effective Sanitization of Link Decoration [20.03929841111819]
We present PURL, a machine-learning approach that leverages a cross-layer graph representation of webpage execution to safely and effectively sanitize link decoration. Our evaluation shows that PURL significantly outperforms existing countermeasures in terms of accuracy and reducing website breakage.
arXiv Detail & Related papers (2023-08-07T09:08:39Z)
ThreatCrawl: A BERT-based Focused Crawler for the Cybersecurity Domain [0.0]
A new focused crawler is proposed called ThreatCrawl. It uses BiBERT-based models to classify documents and adapt its crawling path dynamically. It yields harvest rates of up to 52%, which are, to the best of our knowledge, better than the current state of the art.
arXiv Detail & Related papers (2023-04-24T09:53:33Z)
Detecting Backdoors in Deep Text Classifiers [43.36440869257781]
We present the first robust defence mechanism that generalizes to several backdoor attacks against text classification models. Our technique is highly accurate at defending against state-of-the-art backdoor attacks, including data poisoning and weight poisoning.
arXiv Detail & Related papers (2022-10-11T07:48:03Z)
An anomaly detection approach for backdoored neural networks: face recognition as a case study [77.92020418343022]
We propose a novel backdoored network detection method based on the principle of anomaly detection. We test our method on a novel dataset of backdoored networks and report detectability results with perfect scores.
arXiv Detail & Related papers (2022-08-22T12:14:13Z)
Verifying Learning-Based Robotic Navigation Systems [61.01217374879221]
We show how modern verification engines can be used for effective model selection. Specifically, we use verification to detect and rule out policies that may demonstrate suboptimal behavior. Our work is the first to demonstrate the use of verification backends for recognizing suboptimal DRL policies in real-world robots.
arXiv Detail & Related papers (2022-05-26T17:56:43Z)
Classification of URL bitstreams using Bag of Bytes [3.2204506933585026]
In this paper, we apply a mechanical approach to generate feature vectors from URL strings. Our approach achieved 23% better accuracy compared to the existing DL-based approach.
arXiv Detail & Related papers (2021-11-11T07:43:45Z)
Black-box Detection of Backdoor Attacks with Limited Information and Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model. In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z)
AutoOD: Automated Outlier Detection via Curiosity-guided Search and Self-imitation Learning [72.99415402575886]
Outlier detection is an important data mining task with numerous practical applications. We propose AutoOD, an automated outlier detection framework, which aims to search for an optimal neural network model. Experimental results on various real-world benchmark datasets demonstrate that the deep model identified by AutoOD achieves the best performance.
arXiv Detail & Related papers (2020-06-19T18:57:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.