WebGuard++:Interpretable Malicious URL Detection via Bidirectional Fusion of HTML Subgraphs and Multi-Scale Convolutional BERT
- URL: http://arxiv.org/abs/2506.19356v1
- Date: Tue, 24 Jun 2025 06:36:51 GMT
- Title: WebGuard++:Interpretable Malicious URL Detection via Bidirectional Fusion of HTML Subgraphs and Multi-Scale Convolutional BERT
- Authors: Ye Tian, Zhang Yumin, Yifan Jia, Jianguo Sun, Yanbin Wang,
- Abstract summary: URL+ HTML feature fusion shows promise for robust malicious URL detection, since attacker artifacts persist in DOM structures.<n>We present WebGuard++, a detection framework with 4 novel components.<n> Experiments show WebGuard++ achieves significant improvements over state-of-the-art baselines.
- Score: 3.6220178465092503
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: URL+HTML feature fusion shows promise for robust malicious URL detection, since attacker artifacts persist in DOM structures. However, prior work suffers from four critical shortcomings: (1) incomplete URL modeling, failing to jointly capture lexical patterns and semantic context; (2) HTML graph sparsity, where threat-indicative nodes (e.g., obfuscated scripts) are isolated amid benign content, causing signal dilution during graph aggregation; (3) unidirectional analysis, ignoring URL-HTML feature bidirectional interaction; and (4) opaque decisions, lacking attribution to malicious DOM components. To address these challenges, we present WebGuard++, a detection framework with 4 novel components: 1) Cross-scale URL Encoder: Hierarchically learns local-to-global and coarse to fine URL features based on Transformer network with dynamic convolution. 2) Subgraph-aware HTML Encoder: Decomposes DOM graphs into interpretable substructures, amplifying sparse threat signals via Hierarchical feature fusion. 3) Bidirectional Coupling Module: Aligns URL and HTML embeddings through cross-modal contrastive learning, optimizing inter-modal consistency and intra-modal specificity. 4) Voting Module: Localizes malicious regions through consensus voting on malicious subgraph predictions. Experiments show WebGuard++ achieves significant improvements over state-of-the-art baselines, achieving 1.1x-7.9x higher TPR at fixed FPR of 0.001 and 0.0001 across both datasets.
Related papers
- Breaking Obfuscation: Cluster-Aware Graph with LLM-Aided Recovery for Malicious JavaScript Detection [9.83040332336481]
Malicious JavaScript code poses significant threats to user privacy, system integrity, and enterprise security.<n>We propose DeCoda, a hybrid defense framework that combines large language model (LLM)-based deobfuscation with code graph learning.
arXiv Detail & Related papers (2025-07-30T07:46:49Z) - SG-Reg: Generalizable and Efficient Scene Graph Registration [23.3853919684438]
We design a scene graph network to encode multiple modalities of semantic nodes.<n>In the back-end, we employ a robust pose estimator to decide transformation according to the correspondences.<n>Our method achieves a slightly higher registration recall while requiring only 52 KB of communication bandwidth for each query frame.
arXiv Detail & Related papers (2025-04-20T01:22:40Z) - SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding [56.079013202051094]
We present SegVG, a novel method transfers the box-level annotation as signals to provide an additional pixel-level supervision for Visual Grounding.
This approach allows us to iteratively exploit the annotation as signals for both box-level regression and pixel-level segmentation.
arXiv Detail & Related papers (2024-07-03T15:30:45Z) - Part-aware Unified Representation of Language and Skeleton for Zero-shot Action Recognition [57.97930719585095]
We introduce Part-aware Unified Representation between Language and Skeleton (PURLS) to explore visual-semantic alignment at both local and global scales.
Our approach is evaluated on various skeleton/language backbones and three large-scale datasets.
The results showcase the universality and superior performance of PURLS, surpassing prior skeleton-based solutions and standard baselines from other domains.
arXiv Detail & Related papers (2024-06-19T08:22:32Z) - DGMamba: Domain Generalization via Generalized State Space Model [80.82253601531164]
Domain generalization(DG) aims at solving distribution shift problems in various scenes.
Mamba, as an emerging state space model (SSM), possesses superior linear complexity and global receptive fields.
We propose a novel framework for DG, named DGMamba, that excels in strong generalizability toward unseen domains.
arXiv Detail & Related papers (2024-04-11T14:35:59Z) - RAGFormer: Learning Semantic Attributes and Topological Structure for Fraud Detection [8.050935113945428]
We present a novel framework called Relation-Aware GNN with transFormer(RAGFormer)<n>RAGFormer embeds both semantic and topological features into a target node.<n>The simple yet effective network consists of a semantic encoder, a topology encoder, and an attention fusion module.
arXiv Detail & Related papers (2024-02-27T12:53:15Z) - TransURL: Improving malicious URL detection with multi-layer Transformer encoding and multi-scale pyramid features [9.873643699502853]
We propose a novel approach for malicious URL detection, named TransURL.<n>This method is implemented by co-training the character-aware Transformer with three feature modules.<n> Experimental results demonstrate a significant improvement compared to previous methods.
arXiv Detail & Related papers (2023-12-01T11:27:00Z) - PMANet: Malicious URL detection via post-trained language model guided multi-level feature attention network [16.73322002436809]
We propose PMANet, a pre-trained Language Model-Guided multi-level feature attention network.<n>PMANet employs a post-training process with three self-supervised objectives: masked language modeling, noisy language modeling, and domain discrimination.<n> Experiments on diverse scenarios, including small-scale data, class imbalance, and adversarial attacks, demonstrate PMANet's superiority over state-of-the-art models.
arXiv Detail & Related papers (2023-11-21T06:23:08Z) - Object Segmentation by Mining Cross-Modal Semantics [68.88086621181628]
We propose a novel approach by mining the Cross-Modal Semantics to guide the fusion and decoding of multimodal features.
Specifically, we propose a novel network, termed XMSNet, consisting of (1) all-round attentive fusion (AF), (2) coarse-to-fine decoder (CFD), and (3) cross-layer self-supervision.
arXiv Detail & Related papers (2023-05-17T14:30:11Z) - Software Vulnerability Detection via Deep Learning over Disaggregated
Code Graph Representation [57.92972327649165]
This work explores a deep learning approach to automatically learn the insecure patterns from code corpora.
Because code naturally admits graph structures with parsing, we develop a novel graph neural network (GNN) to exploit both the semantic context and structural regularity of a program.
arXiv Detail & Related papers (2021-09-07T21:24:36Z) - Attentive WaveBlock: Complementarity-enhanced Mutual Networks for
Unsupervised Domain Adaptation in Person Re-identification and Beyond [97.25179345878443]
This paper proposes a novel light-weight module, the Attentive WaveBlock (AWB)
AWB can be integrated into the dual networks of mutual learning to enhance the complementarity and further depress noise in the pseudo-labels.
Experiments demonstrate that the proposed method achieves state-of-the-art performance with significant improvements on multiple UDA person re-identification tasks.
arXiv Detail & Related papers (2020-06-11T15:40:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.