Fugu-MT 論文翻訳(概要): URL2Graph++: Unified Semantic-Structural-Character Learning for Malicious URL Detection

論文の概要: URL2Graph++: Unified Semantic-Structural-Character Learning for Malicious URL Detection

arxiv url: http://arxiv.org/abs/2509.10287v1
Date: Fri, 12 Sep 2025 14:27:04 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-15 16:03:08.123435
Title: URL2Graph++: Unified Semantic-Structural-Character Learning for Malicious URL Detection
Title（参考訳）: URL2Graph++: 悪意のあるURL検出のための統一意味構造文字学習
Authors: Ye Tian, Yifan Jia, Yanbin Wang, Jianguo Sun, Zhiquan Liu, Xiaowen Ling,
Abstract要約: 悪意のあるURL検出は、サイバーセキュリティにおいて依然として大きな課題である。マルチグラニュラリティグラフ学習とセマンティック埋め込みを組み合わせた新しい悪意のあるURL検出手法を提案する。その結果,提案手法は大規模言語モデルを含むSOTA性能を上回ることがわかった。
参考スコア（独自算出の注目度）: 11.415725075802344
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Malicious URL detection remains a major challenge in cybersecurity, primarily due to two factors: (1) the exponential growth of the Internet has led to an immense diversity of URLs, making generalized detection increasingly difficult; and (2) attackers are increasingly employing sophisticated obfuscation techniques to evade detection. We advocate that addressing these challenges fundamentally requires: (1) obtaining semantic understanding to improve generalization across vast and diverse URL sets, and (2) accurately modeling contextual relationships within the structural composition of URLs. In this paper, we propose a novel malicious URL detection method combining multi-granularity graph learning with semantic embedding to jointly capture semantic, character-level, and structural features for robust URL analysis. To model internal dependencies within URLs, we first construct dual-granularity URL graphs at both subword and character levels, where nodes represent URL tokens/characters and edges encode co-occurrence relationships. To obtain fine-grained embeddings, we initialize node representations using a character-level convolutional network. The two graphs are then processed through jointly trained Graph Convolutional Networks to learn consistent graph-level representations, enabling the model to capture complementary structural features that reflect co-occurrence patterns and character-level dependencies. Furthermore, we employ BERT to derive semantic representations of URLs for semantically aware understanding. Finally, we introduce a gated dynamic fusion network to combine the semantically enriched BERT representations with the jointly optimized graph vectors, further enhancing detection performance. We extensively evaluate our method across multiple challenging dimensions. Results show our method exceeds SOTA performance, including against large language models.
Abstract（参考訳）: 悪意のあるURL検出は、主にサイバーセキュリティにおいて大きな課題であり、(1)インターネットの指数的な成長によってURLの多様性が大きくなり、一般化された検出がますます困難になり、(2)攻撃者は検出を避けるために高度な難読化技術を用いている。これらの課題に対処するには,(1)多種多様なURL集合間の一般化を改善するために意味理解を得ること,(2)URLの構造構成内のコンテキスト関係を正確にモデル化することが必要である,と我々は主張する。本稿では,多粒度グラフ学習とセマンティック埋め込みを組み合わせた新しい悪意のあるURL検出手法を提案する。 URL内の内部依存性をモデル化するために、私たちはまず、URLトークン/文字をノードが表現し、エッジが共起関係を符号化する、サブワードとキャラクタレベルの両方で二重粒度URLグラフを構築します。粒度の細かい埋め込みを得るために,文字レベルの畳み込みネットワークを用いてノード表現を初期化する。 2つのグラフは共同でトレーニングされたグラフ畳み込みネットワークを通じて処理され、一貫性のあるグラフレベルの表現を学習し、モデルが共起パターンとキャラクタレベルの依存関係を反映した補完的な構造的特徴をキャプチャすることを可能にする。さらに、BERTを用いて、意味を意識した理解のためのURLの意味表現を導出する。最後に,意味に富んだBERT表現と共同最適化されたグラフベクトルを組み合わせ,さらに検出性能を向上させるために,ゲート型動的融合ネットワークを導入する。複数の課題にまたがる手法を広範囲に評価した。その結果,提案手法は大規模言語モデルを含むSOTA性能を上回ることがわかった。

論文の概要: URL2Graph++: Unified Semantic-Structural-Character Learning for Malicious URL Detection

関連論文リスト