PyraTrans: Attention-Enriched Pyramid Transformer for Malicious URL Detection
- URL: http://arxiv.org/abs/2312.00508v2
- Date: Wed, 6 Dec 2023 16:46:54 GMT
- Title: PyraTrans: Attention-Enriched Pyramid Transformer for Malicious URL Detection
- Authors: Ruitong Liu, Yanbin Wang, Zhenhao Guo, Haitao Xu, Zhan Qin, Wenrui Ma, Fan Zhang,
- Abstract summary: PyraTrans is a novel method that integrates pretrained Transformers with pyramid feature learning to detect malicious URL.
In several challenging experimental scenarios, the proposed method has shown significant improvements in accuracy, generalization, and robustness.
- Score: 9.873643699502853
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although advancements in machine learning have driven the development of malicious URL detection technology, current techniques still face significant challenges in their capacity to generalize and their resilience against evolving threats. In this paper, we propose PyraTrans, a novel method that integrates pretrained Transformers with pyramid feature learning to detect malicious URL. PyraTrans utilizes a pretrained CharBERT as its foundation and is augmented with three interconnected feature modules: 1) Encoder Feature Extraction, extracting multi-order feature matrices from each CharBERT encoder layer; 2) Multi-Scale Feature Learning, capturing local contextual insights at various scales and aggregating information across encoder layers; and 3) Spatial Pyramid Attention, focusing on regional-level attention to emphasize areas rich in expressive information. The proposed approach addresses the limitations of the Transformer in local feature learning and regional relational awareness, which are vital for capturing URL-specific word patterns, character combinations, or structural anomalies. In several challenging experimental scenarios, the proposed method has shown significant improvements in accuracy, generalization, and robustness in malicious URL detection. For instance, it achieved a peak F1-score improvement of 40% in class-imbalanced scenarios, and exceeded the best baseline result by 14.13% in accuracy in adversarial attack scenarios. Additionally, we conduct a case study where our method accurately identifies all 30 active malicious web pages, whereas two pior SOTA methods miss 4 and 7 malicious web pages respectively. Codes and data are available at:https://github.com/Alixyvtte/PyraTrans.
Related papers
- Efficient Phishing URL Detection Using Graph-based Machine Learning and Loopy Belief Propagation [12.89058029173131]
We propose a graph-based machine learning model for phishing URL detection.
We integrate URL structure and network-level features such as IP addresses and authoritative name servers.
Experiments on real-world datasets demonstrate our model's effectiveness by achieving F1 score of up to 98.77%.
arXiv Detail & Related papers (2025-01-12T19:49:00Z) - Dual-Path Adversarial Lifting for Domain Shift Correction in Online Test-time Adaptation [59.18151483767509]
We introduce a dual-path token lifting for domain shift correction in test time adaptation.
We then perform dual-path lifting with interleaved token prediction and update between the path of domain shift tokens and the path of class tokens.
Experimental results on the benchmark datasets demonstrate that our proposed method significantly improves the online fully test-time domain adaptation performance.
arXiv Detail & Related papers (2024-08-26T02:33:47Z) - Few-Shot API Attack Detection: Overcoming Data Scarcity with GAN-Inspired Learning [9.035212370386846]
This paper proposes a novel few-shot detection approach motivated by Natural Language Processing (NLP) and advanced Generative Adrialversa Network (GAN)-inspired techniques.
Our method enhances the contextual understanding of API requests, leading to improved anomaly detection compared to traditional methods.
arXiv Detail & Related papers (2024-05-18T11:10:45Z) - AntiPhishStack: LSTM-based Stacked Generalization Model for Optimized
Phishing URL Detection [0.32141666878560626]
This paper introduces a two-phase stack generalized model named AntiPhishStack, designed to detect phishing sites.
The model leverages the learning of URLs and character-level TF-IDF features symmetrically, enhancing its ability to combat emerging phishing threats.
Experimental validation on two benchmark datasets, comprising benign and phishing or malicious URLs, demonstrates the model's exceptional performance, achieving a notable 96.04% accuracy compared to existing studies.
arXiv Detail & Related papers (2024-01-17T03:44:27Z) - Malicious URL Detection via Pretrained Language Model Guided Multi-Level Feature Attention Network [15.888763097896339]
We present an efficient pre-training model-based framework for malicious URL detection.
We develop three key modules: hierarchical feature extraction, layer-aware attention, and spatial pyramid pooling.
The proposed method has been extensively validated on multiple public datasets.
arXiv Detail & Related papers (2023-11-21T06:23:08Z) - In-Context Convergence of Transformers [63.04956160537308]
We study the learning dynamics of a one-layer transformer with softmax attention trained via gradient descent.
For data with imbalanced features, we show that the learning dynamics take a stage-wise convergence process.
arXiv Detail & Related papers (2023-10-08T17:55:33Z) - Meta-Transformer: A Unified Framework for Multimodal Learning [105.77219833997962]
Multimodal learning aims to build models that process and relate information from multiple modalities.
Despite years of development in this field, it still remains challenging to design a unified network for processing various modalities.
We propose a framework, named Meta-Transformer, that leverages a $textbffrozen$ encoder to perform multimodal perception.
arXiv Detail & Related papers (2023-07-20T12:10:29Z) - Transformers for End-to-End InfoSec Tasks: A Feasibility Study [6.847381178288385]
We implement transformer models for two distinct InfoSec data formats - specifically URLs and PE files.
We show that our URL transformer model requires a different training approach to reach high performance levels.
We demonstrate that this approach performs comparably to well-established malware detection models on benchmark PE file datasets.
arXiv Detail & Related papers (2022-12-05T23:50:46Z) - Backdoor Attacks for Remote Sensing Data with Wavelet Transform [14.50261153230204]
In this paper, we provide a systematic analysis of backdoor attacks for remote sensing data.
We propose a novel wavelet transform-based attack (WABA) method, which can achieve invisible attacks by injecting the trigger image into the poisoned image.
Despite its simplicity, the proposed method can significantly cheat the current state-of-the-art deep learning models with a high attack success rate.
arXiv Detail & Related papers (2022-11-15T10:49:49Z) - The Devil is in the Details: On Models and Training Regimes for Few-Shot
Intent Classification [81.60168035505039]
Few-shot Classification (FSIC) is one of the key challenges in modular task-oriented dialog systems.
We show that cross-encoder architecture and episodic meta-learning consistently yields the best FSIC performance.
Our findings pave the way for conducting state-of-the-art research in FSIC.
arXiv Detail & Related papers (2022-10-12T17:37:54Z) - Focused Decoding Enables 3D Anatomical Detection by Transformers [64.36530874341666]
We propose a novel Detection Transformer for 3D anatomical structure detection, dubbed Focused Decoder.
Focused Decoder leverages information from an anatomical region atlas to simultaneously deploy query anchors and restrict the cross-attention's field of view.
We evaluate our proposed approach on two publicly available CT datasets and demonstrate that Focused Decoder not only provides strong detection results and thus alleviates the need for a vast amount of annotated data but also exhibits exceptional and highly intuitive explainability of results via attention weights.
arXiv Detail & Related papers (2022-07-21T22:17:21Z) - Defect Transformer: An Efficient Hybrid Transformer Architecture for
Surface Defect Detection [2.0999222360659604]
We propose an efficient hybrid transformer architecture, termed Defect Transformer (DefT), for surface defect detection.
DefT incorporates CNN and transformer into a unified model to capture local and non-local relationships collaboratively.
Experiments on three datasets demonstrate the superiority and efficiency of our method compared with other CNN- and transformer-based networks.
arXiv Detail & Related papers (2022-07-17T23:37:48Z) - Dual Vision Transformer [114.1062057736447]
We propose a novel Transformer architecture that aims to mitigate the cost issue, named Dual Vision Transformer (Dual-ViT)
The new architecture incorporates a critical semantic pathway that can more efficiently compress token vectors into global semantics with reduced order of complexity.
We empirically demonstrate that Dual-ViT provides superior accuracy than SOTA Transformer architectures with reduced training complexity.
arXiv Detail & Related papers (2022-07-11T16:03:44Z) - Pushing the Limits of Simple Pipelines for Few-Shot Learning: External
Data and Fine-Tuning Make a Difference [74.80730361332711]
Few-shot learning is an important and topical problem in computer vision.
We show that a simple transformer-based pipeline yields surprisingly good performance on standard benchmarks.
arXiv Detail & Related papers (2022-04-15T02:55:58Z) - MDMMT-2: Multidomain Multimodal Transformer for Video Retrieval, One
More Step Towards Generalization [65.09758931804478]
Three different data sources are combined: weakly-supervised videos, crowd-labeled text-image pairs and text-video pairs.
A careful analysis of available pre-trained networks helps to choose the best prior-knowledge ones.
arXiv Detail & Related papers (2022-03-14T13:15:09Z) - Visual Transformer for Task-aware Active Learning [49.903358393660724]
We present a novel pipeline for pool-based Active Learning.
Our method exploits accessible unlabelled examples during training to estimate their co-relation with the labelled examples.
Visual Transformer models non-local visual concept dependency between labelled and unlabelled examples.
arXiv Detail & Related papers (2021-06-07T17:13:59Z) - DoS and DDoS Mitigation Using Variational Autoencoders [15.23225419183423]
We explore the potential of Variational Autoencoders to serve as a component within an intelligent security solution.
Two methods based on the ability of Variational Autoencoders to learn latent representations from network traffic flows are proposed.
arXiv Detail & Related papers (2021-05-14T15:38:40Z) - Spatiotemporal Transformer for Video-based Person Re-identification [102.58619642363958]
We show that, despite the strong learning ability, the vanilla Transformer suffers from an increased risk of over-fitting.
We propose a novel pipeline where the model is pre-trained on a set of synthesized video data and then transferred to the downstream domains.
The derived algorithm achieves significant accuracy gain on three popular video-based person re-identification benchmarks.
arXiv Detail & Related papers (2021-03-30T16:19:27Z) - Transformers Solve the Limited Receptive Field for Monocular Depth
Prediction [82.90445525977904]
We propose TransDepth, an architecture which benefits from both convolutional neural networks and transformers.
This is the first paper which applies transformers into pixel-wise prediction problems involving continuous labels.
arXiv Detail & Related papers (2021-03-22T18:00:13Z) - Training Transformers for Information Security Tasks: A Case Study on
Malicious URL Prediction [3.660098145214466]
We implement a malicious/benign predictor URL based on a transformer architecture that is trained from scratch.
We show that in contrast to conventional natural language processing (NLP) transformers, this model requires a different training approach to work well.
arXiv Detail & Related papers (2020-11-05T18:58:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.