Malicious URL Detection via Pretrained Language Model Guided Multi-Level Feature Attention Network
- URL: http://arxiv.org/abs/2311.12372v1
- Date: Tue, 21 Nov 2023 06:23:08 GMT
- Title: Malicious URL Detection via Pretrained Language Model Guided Multi-Level Feature Attention Network
- Authors: Ruitong Liu, Yanbin Wang, Haitao Xu, Zhan Qin, Yiwei Liu, Zheng Cao,
- Abstract summary: We present an efficient pre-training model-based framework for malicious URL detection.
We develop three key modules: hierarchical feature extraction, layer-aware attention, and spatial pyramid pooling.
The proposed method has been extensively validated on multiple public datasets.
- Score: 15.888763097896339
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The widespread use of the Internet has revolutionized information retrieval methods. However, this transformation has also given rise to a significant cybersecurity challenge: the rapid proliferation of malicious URLs, which serve as entry points for a wide range of cyber threats. In this study, we present an efficient pre-training model-based framework for malicious URL detection. Leveraging the subword and character-aware pre-trained model, CharBERT, as our foundation, we further develop three key modules: hierarchical feature extraction, layer-aware attention, and spatial pyramid pooling. The hierarchical feature extraction module follows the pyramid feature learning principle, extracting multi-level URL embeddings from the different Transformer layers of CharBERT. Subsequently, the layer-aware attention module autonomously learns connections among features at various hierarchical levels and allocates varying weight coefficients to each level of features. Finally, the spatial pyramid pooling module performs multiscale downsampling on the weighted multi-level feature pyramid, achieving the capture of local features as well as the aggregation of global features. The proposed method has been extensively validated on multiple public datasets, demonstrating a significant improvement over prior works, with the maximum accuracy gap reaching 8.43% compared to the previous state-of-the-art method. Additionally, we have assessed the model's generalization and robustness in scenarios such as cross-dataset evaluation and adversarial attacks. Finally, we conducted real-world case studies on the active phishing URLs.
Related papers
- Training Large Language Models for Advanced Typosquatting Detection [0.0]
Typosquatting is a cyber threat that exploits human error in typing URLs to deceive users, distribute malware, and conduct phishing attacks.
This study introduces a novel approach leveraging large language models (LLMs) to enhance typosquatting detection.
Experimental results indicate that the Phi-4 14B model outperformed other tested models when properly fine tuned achieving a 98% accuracy rate with only a few thousand training samples.
arXiv Detail & Related papers (2025-03-28T13:16:27Z) - One for All: Multi-Domain Joint Training for Point Cloud Based 3D Object Detection [71.78795573911512]
We propose textbfOneDet3D, a universal one-for-all model that addresses 3D detection across different domains.
We propose the domain-aware in scatter and context, guided by a routing mechanism, to address the data interference issue.
The fully sparse structure and anchor-free head further accommodate point clouds with significant scale disparities.
arXiv Detail & Related papers (2024-11-03T14:21:56Z) - An Advanced Deep Learning Based Three-Stream Hybrid Model for Dynamic Hand Gesture Recognition [1.7985212575295124]
We propose a novel three-stream hybrid model that combines RGB pixel and skeleton-based features to recognize hand gestures.
In the procedure, we preprocessed the dataset, including augmentation, to make rotation, translation, and scaling independent systems.
We mainly produced a powerful feature vector by taking advantage of the pixel-based deep learning feature and pos-estimation-based stacked deep learning feature.
arXiv Detail & Related papers (2024-08-15T09:05:00Z) - Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers [59.0181939916084]
Traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries.
We propose a novel Priors Distillation (RPD) method to extract priors from the well-trained transformers on massive images.
Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification.
arXiv Detail & Related papers (2024-07-26T06:29:09Z) - StyDeSty: Min-Max Stylization and Destylization for Single Domain Generalization [85.18995948334592]
Single domain generalization (single DG) aims at learning a robust model generalizable to unseen domains from only one training domain.
State-of-the-art approaches have mostly relied on data augmentations, such as adversarial perturbation and style enhancement, to synthesize new data.
We propose emphStyDeSty, which explicitly accounts for the alignment of the source and pseudo domains in the process of data augmentation.
arXiv Detail & Related papers (2024-06-01T02:41:34Z) - URLBERT:A Contrastive and Adversarial Pre-trained Model for URL
Classification [10.562100395816595]
URLs play a crucial role in understanding and categorizing web content.
This paper introduces URLBERT, the first pre-trained representation learning model applied to a variety of URL classification or detection tasks.
arXiv Detail & Related papers (2024-02-18T07:51:20Z) - PyraTrans: Attention-Enriched Pyramid Transformer for Malicious URL Detection [9.873643699502853]
PyraTrans is a novel method that integrates pretrained Transformers with pyramid feature learning to detect malicious URL.
In several challenging experimental scenarios, the proposed method has shown significant improvements in accuracy, generalization, and robustness.
arXiv Detail & Related papers (2023-12-01T11:27:00Z) - Task-Distributionally Robust Data-Free Meta-Learning [99.56612787882334]
Data-Free Meta-Learning (DFML) aims to efficiently learn new tasks by leveraging multiple pre-trained models without requiring their original training data.
For the first time, we reveal two major challenges hindering their practical deployments: Task-Distribution Shift ( TDS) and Task-Distribution Corruption (TDC)
arXiv Detail & Related papers (2023-11-23T15:46:54Z) - Genetic Algorithm-Based Dynamic Backdoor Attack on Federated
Learning-Based Network Traffic Classification [1.1887808102491482]
We propose GABAttack, a novel genetic algorithm-based backdoor attack against federated learning for network traffic classification.
This research serves as an alarming call for network security experts and practitioners to develop robust defense measures against such attacks.
arXiv Detail & Related papers (2023-09-27T14:02:02Z) - M$^3$Net: Multilevel, Mixed and Multistage Attention Network for Salient
Object Detection [22.60675416709486]
M$3$Net is an attention network for Salient Object Detection.
Cross-attention approach to achieve the interaction between multilevel features.
Mixed Attention Block aims at modeling context at both global and local levels.
Multilevel supervision strategy to optimize the aggregated feature stage-by-stage.
arXiv Detail & Related papers (2023-09-15T12:46:14Z) - Grounded Decoding: Guiding Text Generation with Grounded Models for
Embodied Agents [111.15288256221764]
Grounded-decoding project aims to solve complex, long-horizon tasks in a robotic setting by leveraging the knowledge of both models.
We frame this as a problem similar to probabilistic filtering: decode a sequence that both has high probability under the language model and high probability under a set of grounded model objectives.
We demonstrate how such grounded models can be obtained across three simulation and real-world domains, and that the proposed decoding strategy is able to solve complex, long-horizon tasks in a robotic setting by leveraging the knowledge of both models.
arXiv Detail & Related papers (2023-03-01T22:58:50Z) - Federated Zero-Shot Learning for Visual Recognition [55.65879596326147]
We propose a novel Federated Zero-Shot Learning FedZSL framework.
FedZSL learns a central model from the decentralized data residing on edge devices.
The effectiveness and robustness of FedZSL are demonstrated by extensive experiments conducted on three zero-shot benchmark datasets.
arXiv Detail & Related papers (2022-09-05T14:49:34Z) - DFC: Deep Feature Consistency for Robust Point Cloud Registration [0.4724825031148411]
We present a novel learning-based alignment network for complex alignment scenes.
We validate our approach on the 3DMatch dataset and the KITTI odometry dataset.
arXiv Detail & Related papers (2021-11-15T08:27:21Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - InfoBERT: Improving Robustness of Language Models from An Information
Theoretic Perspective [84.78604733927887]
Large-scale language models such as BERT have achieved state-of-the-art performance across a wide range of NLP tasks.
Recent studies show that such BERT-based models are vulnerable facing the threats of textual adversarial attacks.
We propose InfoBERT, a novel learning framework for robust fine-tuning of pre-trained language models.
arXiv Detail & Related papers (2020-10-05T20:49:26Z) - Learning One Class Representations for Face Presentation Attack
Detection using Multi-channel Convolutional Neural Networks [7.665392786787577]
presentation attack detection (PAD) methods often fail in generalizing to unseen attacks.
We propose a new framework for PAD using a one-class classifier, where the representation used is learned with a Multi-Channel Convolutional Neural Network (MCCNN)
A novel loss function is introduced, which forces the network to learn a compact embedding for bonafide class while being far from the representation of attacks.
The proposed framework introduces a novel approach to learn a robust PAD system from bonafide and available (known) attack classes.
arXiv Detail & Related papers (2020-07-22T14:19:33Z) - Multi-scale Interactive Network for Salient Object Detection [91.43066633305662]
We propose the aggregate interaction modules to integrate the features from adjacent levels.
To obtain more efficient multi-scale features, the self-interaction modules are embedded in each decoder unit.
Experimental results on five benchmark datasets demonstrate that the proposed method without any post-processing performs favorably against 23 state-of-the-art approaches.
arXiv Detail & Related papers (2020-07-17T15:41:37Z) - Bifurcated backbone strategy for RGB-D salient object detection [168.19708737906618]
We leverage the inherent multi-modal and multi-level nature of RGB-D salient object detection to devise a novel cascaded refinement network.
Our architecture, named Bifurcated Backbone Strategy Network (BBS-Net), is simple, efficient, and backbone-independent.
arXiv Detail & Related papers (2020-07-06T13:01:30Z) - Crowd Counting via Hierarchical Scale Recalibration Network [61.09833400167511]
We propose a novel Hierarchical Scale Recalibration Network (HSRNet) to tackle the task of crowd counting.
HSRNet models rich contextual dependencies and recalibrating multiple scale-associated information.
Our approach can ignore various noises selectively and focus on appropriate crowd scales automatically.
arXiv Detail & Related papers (2020-03-07T10:06:47Z) - Cross-layer Feature Pyramid Network for Salient Object Detection [102.20031050972429]
We propose a novel Cross-layer Feature Pyramid Network to improve the progressive fusion in salient object detection.
The distributed features per layer own both semantics and salient details from all other layers simultaneously, and suffer reduced loss of important information.
arXiv Detail & Related papers (2020-02-25T14:06:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.