Dark Web Activity Classification Using Deep Learning
- URL: http://arxiv.org/abs/2306.07980v3
- Date: Sat, 1 Jul 2023 16:49:17 GMT
- Title: Dark Web Activity Classification Using Deep Learning
- Authors: Ali Fayzi, Mohammad Fayzi, Kourosh Dadashtabar Ahmadi
- Abstract summary: We propose a search engine that employs deep learning to detect the titles of activities on the dark web.
We focus on five categories of activities, including drug trading, weapon trading, selling stolen bank cards, selling fake IDs, and selling illegal currencies.
Our aim is to extract relevant images from websites with a ".onion" extension and identify the titles of websites without images by extracting keywords from the text of the pages.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In contemporary times, people rely heavily on the internet and search engines
to obtain information, either directly or indirectly. However, the information
accessible to users constitutes merely 4% of the overall information present on
the internet, which is commonly known as the surface web. The remaining
information that eludes search engines is called the deep web. The deep web
encompasses deliberately hidden information, such as personal email accounts,
social media accounts, online banking accounts, and other confidential data.
The deep web contains several critical applications, including databases of
universities, banks, and civil records, which are off-limits and illegal to
access. The dark web is a subset of the deep web that provides an ideal
platform for criminals and smugglers to engage in illicit activities, such as
drug trafficking, weapon smuggling, selling stolen bank cards, and money
laundering. In this article, we propose a search engine that employs deep
learning to detect the titles of activities on the dark web. We focus on five
categories of activities, including drug trading, weapon trading, selling
stolen bank cards, selling fake IDs, and selling illegal currencies. Our aim is
to extract relevant images from websites with a ".onion" extension and identify
the titles of websites without images by extracting keywords from the text of
the pages. Furthermore, we introduce a dataset of images called Darkoob, which
we have gathered and used to evaluate our proposed method. Our experimental
results demonstrate that the proposed method achieves an accuracy rate of 94%
on the test dataset.
Related papers
- Snorkeling in dark waters: A longitudinal surface exploration of unique Tor Hidden Services (Extended Version) [2.498836880652668]
The Onion Router (Tor) is a controversial network whose utility is constantly under scrutiny.
In this work, we present a large-scale analysis of the Tor Network.
We leverage our crawler, dubbed Mimir, which automatically collects and visits content linked within the pages to collect a dataset of pages from more than 25k sites.
arXiv Detail & Related papers (2025-04-23T15:59:16Z) - Deepfake Media Generation and Detection in the Generative AI Era: A Survey and Outlook [101.30779332427217]
We survey deepfake generation and detection techniques, including the most recent developments in the field.
We identify various kinds of deepfakes, according to the procedure used to alter or generate the fake content.
We develop a novel multimodal benchmark to evaluate deepfake detectors on out-of-distribution content.
arXiv Detail & Related papers (2024-11-29T08:29:25Z) - How Unique is Whose Web Browser? The role of demographics in browser fingerprinting among US users [50.699390248359265]
Browser fingerprinting can be used to identify and track users across the Web, even without cookies.
This technique and resulting privacy risks have been studied for over a decade.
We provide a first-of-its-kind dataset to enable further research.
arXiv Detail & Related papers (2024-10-09T14:51:58Z) - Phishing Website Detection Using a Combined Model of ANN and LSTM [0.9208007322096533]
phishing is a type of cybercrime, which has the purpose of stealing the personal information of the computer user.
The attackers used personal information like account IDs, passwords, and usernames for the purpose of some fraudulent activities against the user of the computer.
To overcome this problem researchers focused on the machine learning and deep learning approaches.
arXiv Detail & Related papers (2024-03-24T14:46:02Z) - The Devil Behind the Mirror: Tracking the Campaigns of Cryptocurrency Abuses on the Dark Web [39.96427593096699]
We identify 2,564 illicit sites with 1,189 illicit blockchain addresses, which account for 90.8 BTC in revenue.
Our exploration suggests that illicit activities on the dark web have strong correlations, which can guide us to identify new illicit blockchain addresses and onions.
arXiv Detail & Related papers (2024-01-09T16:35:25Z) - When the Few Outweigh the Many: Illicit Content Recognition with
Few-Shot Learning [0.0]
This paper investigates an alternative technique for recognizing illegal activities from images.
Siamese neural networks reach 90.9% on 20-Shot experiments over a 10-class dataset.
arXiv Detail & Related papers (2023-11-28T18:28:03Z) - Identifying key players in dark web marketplaces [58.720142291102135]
This paper aims to identify the key players in Bitcoin transaction networks linked to dark markets.
We show that a large fraction of the traded volume is concentrated in a small group of elite market participants.
Our findings suggest that understanding the behavior of key players in dark web marketplaces is critical to effectively disrupting illegal activities.
arXiv Detail & Related papers (2023-06-15T20:30:43Z) - Protecting User Privacy in Online Settings via Supervised Learning [69.38374877559423]
We design an intelligent approach to online privacy protection that leverages supervised learning.
By detecting and blocking data collection that might infringe on a user's privacy, we can restore a degree of digital privacy to the user.
arXiv Detail & Related papers (2023-04-06T05:20:16Z) - Fighting Malicious Media Data: A Survey on Tampering Detection and
Deepfake Detection [115.83992775004043]
Recent advances in deep learning, particularly deep generative models, open the doors for producing perceptually convincing images and videos at a low cost.
This paper provides a comprehensive review of the current media tampering detection approaches, and discusses the challenges and trends in this field for future research.
arXiv Detail & Related papers (2022-12-12T02:54:08Z) - VeriDark: A Large-Scale Benchmark for Authorship Verification on the
Dark Web [25.00969884543201]
We release VeriDark: a benchmark comprised of three large scale authorship verification datasets and one authorship identification dataset.
We evaluate competitive NLP baselines on the three datasets and perform an analysis of the predictions to better understand the limitations of such approaches.
arXiv Detail & Related papers (2022-07-07T17:57:11Z) - A Crawler Architecture for Harvesting the Clear, Social, and Dark Web
for IoT-Related Cyber-Threat Intelligence [1.1661238776379117]
The clear, social, and dark web have lately been identified as rich sources of valuable cyber-security information.
We present a novel crawling architecture for transparently harvesting data from security websites in the clear web, security forums in the social web, and hacker forums/marketplaces in the dark web.
arXiv Detail & Related papers (2021-09-14T19:26:08Z) - Lighting the Darkness in the Deep Learning Era [118.35081853500411]
Low-light image enhancement (LLIE) aims at improving the perception or interpretability of an image captured in an environment with poor illumination.
Recent advances in this area are dominated by deep learning-based solutions.
We provide a comprehensive survey to cover various aspects ranging from algorithm taxonomy to unsolved open issues.
arXiv Detail & Related papers (2021-04-21T19:12:19Z) - Improving Object Detection with Selective Self-supervised Self-training [62.792445237541145]
We study how to leverage Web images to augment human-curated object detection datasets.
We retrieve Web images by image-to-image search, which incurs less domain shift from the curated data than other search methods.
We propose a novel learning method motivated by two parallel lines of work that explore unlabeled data for image classification.
arXiv Detail & Related papers (2020-07-17T18:05:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.