Finding Phish in a Haystack: A Pipeline for Phishing Classification on
Certificate Transparency Logs
- URL: http://arxiv.org/abs/2106.12343v1
- Date: Wed, 23 Jun 2021 12:24:19 GMT
- Title: Finding Phish in a Haystack: A Pipeline for Phishing Classification on
Certificate Transparency Logs
- Authors: Arthur Drichel, Vincent Drury, Justus von Brandt, Ulrike Meyer
- Abstract summary: phishing prevention techniques mainly utilize reactive blocklists, which leave a window of opportunity'' for attackers during which victims are unprotected.
One possible approach to shorten this window aims to detect phishing attacks earlier, during website preparation, by monitoring Certificate Transparency (CT) logs.
Previous attempts to work with CT log data for phishing classification exist, however they lack evaluations on actual CT log data.
We present a pipeline that facilitates such evaluations by addressing a number of problems when working with CT log data.
- Score: 0.5512295869673147
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current popular phishing prevention techniques mainly utilize reactive
blocklists, which leave a ``window of opportunity'' for attackers during which
victims are unprotected. One possible approach to shorten this window aims to
detect phishing attacks earlier, during website preparation, by monitoring
Certificate Transparency (CT) logs. Previous attempts to work with CT log data
for phishing classification exist, however they lack evaluations on actual CT
log data. In this paper, we present a pipeline that facilitates such
evaluations by addressing a number of problems when working with CT log data.
The pipeline includes dataset creation, training, and past or live
classification of CT logs. Its modular structure makes it possible to easily
exchange classifiers or verification sources to support ground truth labeling
efforts and classifier comparisons. We test the pipeline on a number of new and
existing classifiers, and find a general potential to improve classifiers for
this scenario in the future. We publish the source code of the pipeline and the
used datasets along with this paper
(https://gitlab.com/rwth-itsec/ctl-pipeline), thus making future research in
this direction more accessible.
Related papers
- Evasion-Resilient Detection of DNS-over-HTTPS Data Exfiltration: A Practical Evaluation and Toolkit [0.0]
This project aims to assess how well defenders can detect DNS-over-HTTPS (DoH) file exfiltration, and which evasion strategies can be used by attackers.<n>The originality of this project is the introduction of an end-to-end, containerized pipeline that generates file exfiltration over DoH.<n>The pipeline contains a prediction side, which allows the training of machine learning models based on public labelled datasets.
arXiv Detail & Related papers (2025-12-23T15:07:17Z) - OCR-APT: Reconstructing APT Stories from Audit Logs using Subgraph Anomaly Detection and LLMs [4.663916214040153]
Advanced Persistent Threats (APTs) are stealthy cyberattacks that often evade detection in system-level audit logs.<n>Existing systems apply anomaly detection to these graphs but often suffer from high false positive rates and coarse-grained alerts.<n>We introduce OCR-APT, a system for APT detection and reconstruction of human-like attack stories.
arXiv Detail & Related papers (2025-10-16T23:14:03Z) - VIVID: A Novel Approach to Remediation Prioritization in Static Application Security Testing (SAST) [0.0]
VIVID - Vulnerability Information Via Data flow - is a novel method to extract and consume SAST insights.<n>We present simulations that find out-degree, betweenness centrality, in-eigenvector centrality, and cross-clique connectivity were found to be associated with files exhibiting high vulnerability traffic.<n>This is a novel method to automatically provide development teams an evidence-based prioritized list of files to embed security controls into.
arXiv Detail & Related papers (2025-05-22T04:16:56Z) - A Pipeline of Augmentation and Sequence Embedding for Classification of Imbalanced Network Traffic [0.0]
We propose a pipeline to balance the dataset and classify it using a robust and accurate embedding technique.
We demonstrate that the proposed augmentation pipeline, combined with FS-Embedding, increases convergence speed and leads to a significant reduction in the number of model parameters.
arXiv Detail & Related papers (2025-02-26T07:55:24Z) - An investigation into the performances of the Current state-of-the-art Naive Bayes, Non-Bayesian and Deep Learning Based Classifier for Phishing Detection: A Survey [0.9567504785687562]
Phishing is one of the most effective ways in which cybercriminals get sensitive details from potential victims.
In this research, we did a comprehensive review of current state-of-the-art machine learning and deep learning phishing detection techniques.
arXiv Detail & Related papers (2024-11-24T05:20:09Z) - FakeFormer: Efficient Vulnerability-Driven Transformers for Generalisable Deepfake Detection [12.594436202557446]
This paper investigates why Vision Transformers (ViTs) exhibit a suboptimal performance when dealing with the detection of facial forgeries.
We propose a deepfake detection framework called FakeFormer, which extends ViTs to enforce the extraction of subtle inconsistency-prone information.
Experiments are conducted on diverse well-known datasets, including FF++, Celeb-DF, WildDeepfake, DFD, DFDCP, and DFDC.
arXiv Detail & Related papers (2024-10-29T11:36:49Z) - SINDER: Repairing the Singular Defects of DINOv2 [61.98878352956125]
Vision Transformer models trained on large-scale datasets often exhibit artifacts in the patch token they extract.
We propose a novel fine-tuning smooth regularization that rectifies structural deficiencies using only a small dataset.
arXiv Detail & Related papers (2024-07-23T20:34:23Z) - Fact Checking Beyond Training Set [64.88575826304024]
We show that the retriever-reader suffers from performance deterioration when it is trained on labeled data from one domain and used in another domain.
We propose an adversarial algorithm to make the retriever component robust against distribution shift.
We then construct eight fact checking scenarios from these datasets, and compare our model to a set of strong baseline models.
arXiv Detail & Related papers (2024-03-27T15:15:14Z) - Post-Training Detection of Backdoor Attacks for Two-Class and
Multi-Attack Scenarios [22.22337220509128]
Backdoor attacks (BAs) are an emerging threat to deep neural network classifiers.
We propose a detection framework based on BP reverse-engineering and a novel it expected transferability (ET) statistic.
arXiv Detail & Related papers (2022-01-20T22:21:38Z) - Transformers Can Do Bayesian Inference [56.99390658880008]
We present Prior-Data Fitted Networks (PFNs)
PFNs leverage in-context learning in large-scale machine learning techniques to approximate a large set of posteriors.
We demonstrate that PFNs can near-perfectly mimic Gaussian processes and also enable efficient Bayesian inference for intractable problems.
arXiv Detail & Related papers (2021-12-20T13:07:39Z) - MD-CSDNetwork: Multi-Domain Cross Stitched Network for Deepfake
Detection [80.83725644958633]
Current deepfake generation methods leave discriminative artifacts in the frequency spectrum of fake images and videos.
We present a novel approach, termed as MD-CSDNetwork, for combining the features in the spatial and frequency domains to mine a shared discriminative representation.
arXiv Detail & Related papers (2021-09-15T14:11:53Z) - Robust and Verifiable Information Embedding Attacks to Deep Neural
Networks via Error-Correcting Codes [81.85509264573948]
In the era of deep learning, a user often leverages a third-party machine learning tool to train a deep neural network (DNN) classifier.
In an information embedding attack, an attacker is the provider of a malicious third-party machine learning tool.
In this work, we aim to design information embedding attacks that are verifiable and robust against popular post-processing methods.
arXiv Detail & Related papers (2020-10-26T17:42:42Z) - End-to-End Object Detection with Transformers [88.06357745922716]
We present a new method that views object detection as a direct set prediction problem.
Our approach streamlines the detection pipeline, effectively removing the need for many hand-designed components.
The main ingredients of the new framework, called DEtection TRansformer or DETR, are a set-based global loss.
arXiv Detail & Related papers (2020-05-26T17:06:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.