Snorkeling in dark waters: A longitudinal surface exploration of unique Tor Hidden Services (Extended Version)
- URL: http://arxiv.org/abs/2504.16836v1
- Date: Wed, 23 Apr 2025 15:59:16 GMT
- Title: Snorkeling in dark waters: A longitudinal surface exploration of unique Tor Hidden Services (Extended Version)
- Authors: Alfonso Rodriguez Barredo-Valenzuela, Sergio Pastrana Portillo, Guillermo Suarez-Tangil,
- Abstract summary: The Onion Router (Tor) is a controversial network whose utility is constantly under scrutiny.<n>In this work, we present a large-scale analysis of the Tor Network.<n>We leverage our crawler, dubbed Mimir, which automatically collects and visits content linked within the pages to collect a dataset of pages from more than 25k sites.
- Score: 2.498836880652668
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Onion Router (Tor) is a controversial network whose utility is constantly under scrutiny. On the one hand, it allows for anonymous interaction and cooperation of users seeking untraceable navigation on the Internet. This freedom also attracts criminals who aim to thwart law enforcement investigations, e.g., trading illegal products or services such as drugs or weapons. Tor allows delivering content without revealing the actual hosting address, by means of .onion (or hidden) services. Different from regular domains, these services can not be resolved by traditional name services, are not indexed by regular search engines, and they frequently change. This generates uncertainty about the extent and size of the Tor network and the type of content offered. In this work, we present a large-scale analysis of the Tor Network. We leverage our crawler, dubbed Mimir, which automatically collects and visits content linked within the pages to collect a dataset of pages from more than 25k sites. We analyze the topology of the Tor Network, including its depth and reachability from the surface web. We define a set of heuristics to detect the presence of replicated content (mirrors) and show that most of the analyzed content in the Dark Web (82% approx.) is a replica of other content. Also, we train a custom Machine Learning classifier to understand the type of content the hidden services offer. Overall, our study provides new insights into the Tor network, highlighting the importance of initial seeding for focus on specific topics, and optimize the crawling process. We show that previous work on large-scale Tor measurements does not consider the presence of mirrors, which biases their understanding of the Dark Web topology and the distribution of content.
Related papers
- Web Privacy based on Contextual Integrity: Measuring the Collapse of Online Contexts [0.0]
We operationalize the theory of Privacy as Contextual Integrity and measure persistent user identification within and between Web contexts.<n>We crawl the top-700 popular websites across the contexts of health, finance, news & media, LGBTQ, eCommerce, adult, and education websites, for 27 days.<n>This is a first modest step in measuring Web privacy as Contextual Integrity, opening new avenues for contextual Web privacy research.
arXiv Detail & Related papers (2024-12-19T23:30:29Z) - Fingerprinting and Tracing Shadows: The Development and Impact of Browser Fingerprinting on Digital Privacy [55.2480439325792]
Browser fingerprinting is a growing technique for identifying and tracking users online without traditional methods like cookies.
This paper gives an overview by examining the various fingerprinting techniques and analyzes the entropy and uniqueness of the collected data.
arXiv Detail & Related papers (2024-11-18T20:32:31Z) - Model Inversion Attacks: A Survey of Approaches and Countermeasures [59.986922963781]
Recently, a new type of privacy attack, the model inversion attacks (MIAs), aims to extract sensitive features of private data for training.
Despite the significance, there is a lack of systematic studies that provide a comprehensive overview and deeper insights into MIAs.
This survey aims to summarize up-to-date MIA methods in both attacks and defenses.
arXiv Detail & Related papers (2024-11-15T08:09:28Z) - Unveiling the Digital Fingerprints: Analysis of Internet attacks based on website fingerprints [0.0]
We show that using the newest machine learning algorithms an attacker can deanonymize Tor traffic by applying such techniques.
We capture network packets across 11 days, while users navigate specific web pages, recording data in.pcapng format through the Wireshark network capture tool.
arXiv Detail & Related papers (2024-09-01T18:44:40Z) - Dissecting Adversarial Robustness of Multimodal LM Agents [70.2077308846307]
We manually create 200 targeted adversarial tasks and evaluation scripts in a realistic threat model on top of VisualWebArena.<n>We find that we can successfully break latest agents that use black-box frontier LMs, including those that perform reflection and tree search.<n>We also use ARE to rigorously evaluate how the robustness changes as new components are added.
arXiv Detail & Related papers (2024-06-18T17:32:48Z) - EmInspector: Combating Backdoor Attacks in Federated Self-Supervised Learning Through Embedding Inspection [53.25863925815954]
Federated self-supervised learning (FSSL) has emerged as a promising paradigm that enables the exploitation of clients' vast amounts of unlabeled data.
While FSSL offers advantages, its susceptibility to backdoor attacks has not been investigated.
We propose the Embedding Inspector (EmInspector) that detects malicious clients by inspecting the embedding space of local models.
arXiv Detail & Related papers (2024-05-21T06:14:49Z) - Distinguishing Tor From Other Encrypted Network Traffic Through Character Analysis [0.0]
The Tor network provides a free and widely used anonymization service for everyone.
There are different approaches to distinguishing Tor from non-Tor encrypted network traffic.
We have examined to what extent the number of encryptions contributes to being able to distinguish Tor from non-Tor encrypted data traffic.
arXiv Detail & Related papers (2024-05-15T15:07:31Z) - CRATOR: a Dark Web Crawler [1.7224362150588657]
This study proposes a general dark web crawler designed to extract pages handling security protocols, such as captchas.
Our approach uses a combination of seed URL lists, link analysis, and scanning to discover new content.
arXiv Detail & Related papers (2024-05-10T09:39:12Z) - A Geometrical Approach to Evaluate the Adversarial Robustness of Deep
Neural Networks [52.09243852066406]
Adversarial Converging Time Score (ACTS) measures the converging time as an adversarial robustness metric.
We validate the effectiveness and generalization of the proposed ACTS metric against different adversarial attacks on the large-scale ImageNet dataset.
arXiv Detail & Related papers (2023-10-10T09:39:38Z) - Analyzing Trends in Tor [0.5461938536945721]
Tor was originally started in the Naval Research Laboratory for anonymous Internet browsing and Internet-based communication.
From being used for anonymous communications, it has now segmented into various other use-cases like censorship circumvention, performing illegal activities, etc.
arXiv Detail & Related papers (2022-08-23T18:31:30Z) - Measurement-driven Security Analysis of Imperceptible Impersonation
Attacks [54.727945432381716]
We study the exploitability of Deep Neural Network-based Face Recognition systems.
We show that factors such as skin color, gender, and age, impact the ability to carry out an attack on a specific target victim.
We also study the feasibility of constructing universal attacks that are robust to different poses or views of the attacker's face.
arXiv Detail & Related papers (2020-08-26T19:27:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.