Fugu-MT 論文翻訳(概要): Paladin: Defending LLM-enabled Phishing Emails with a New Trigger-Tag Paradigm

論文の概要: Paladin: Defending LLM-enabled Phishing Emails with a New Trigger-Tag Paradigm

arxiv url: http://arxiv.org/abs/2509.07287v1
Date: Mon, 08 Sep 2025 23:44:00 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-10 14:38:27.147281
Title: Paladin: Defending LLM-enabled Phishing Emails with a New Trigger-Tag Paradigm
Title（参考訳）: パラディン:新たなトリガータグパラダイムでLLM対応フィッシングメールを保存
Authors: Yan Pang, Wenlong Meng, Xiaojing Liao, Tianhao Wang,
Abstract要約: 悪意のあるユーザは、スペルミスなどの簡単に検出可能な機能のないフィッシングメールを合成することができる。このようなモデルはトピック固有のフィッシングメッセージを生成し、ターゲットドメインにコンテンツを調整することができる。既存の意味レベル検出アプローチのほとんどは、それらを確実に識別するのに苦労している。本稿では,様々な挿入戦略を用いてトリガータグ関連をバニラLSMに埋め込むパラジンを提案する。計測されたLLMがフィッシングに関連するコンテンツを生成すると、検出可能なタグが自動的に含まれ、識別が容易になる。
参考スコア（独自算出の注目度）: 26.399199616508596
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the rapid development of large language models, the potential threat of their malicious use, particularly in generating phishing content, is becoming increasingly prevalent. Leveraging the capabilities of LLMs, malicious users can synthesize phishing emails that are free from spelling mistakes and other easily detectable features. Furthermore, such models can generate topic-specific phishing messages, tailoring content to the target domain and increasing the likelihood of success. Detecting such content remains a significant challenge, as LLM-generated phishing emails often lack clear or distinguishable linguistic features. As a result, most existing semantic-level detection approaches struggle to identify them reliably. While certain LLM-based detection methods have shown promise, they suffer from high computational costs and are constrained by the performance of the underlying language model, making them impractical for large-scale deployment. In this work, we aim to address this issue. We propose Paladin, which embeds trigger-tag associations into vanilla LLM using various insertion strategies, creating them into instrumented LLMs. When an instrumented LLM generates content related to phishing, it will automatically include detectable tags, enabling easier identification. Based on the design on implicit and explicit triggers and tags, we consider four distinct scenarios in our work. We evaluate our method from three key perspectives: stealthiness, effectiveness, and robustness, and compare it with existing baseline methods. Experimental results show that our method outperforms the baselines, achieving over 90% detection accuracy across all scenarios.
Abstract（参考訳）: 大規模言語モデルの急速な発展に伴い、悪意のある使用、特にフィッシングコンテンツの生成に対する潜在的な脅威がますます広まっている。 LLMの機能を活用することで、悪意のあるユーザは、ミススペルやその他の簡単に検出可能な機能のないフィッシングメールを合成できる。さらに、そのようなモデルはトピック固有のフィッシングメッセージを生成し、ターゲットドメインにコンテンツを調整し、成功の可能性を高めることができる。 LLMが生成したフィッシングメールは、明確で区別しやすい言語的特徴を欠いていることが多いため、そのようなコンテンツの検出は依然として重大な課題である。結果として、既存の意味レベル検出アプローチのほとんどは、それらを確実に識別するのに苦労している。ある種のLLMに基づく検出手法は将来性を示しているが、高い計算コストに悩まされ、基礎となる言語モデルの性能に制約されているため、大規模展開には実用的ではない。本研究は,この問題に対処することを目的としている。本稿では,様々な挿入戦略を用いてトリガータグ関連をバニラLLMに埋め込んだPaldinを提案する。計測されたLLMがフィッシングに関連するコンテンツを生成すると、検出可能なタグが自動的に含まれ、識別が容易になる。暗黙的かつ明示的なトリガとタグの設計に基づいて、作業では4つの異なるシナリオを検討します。本手法をステルス性,有効性,堅牢性という3つの重要な視点から評価し,既存のベースライン法と比較した。実験の結果,本手法はすべてのシナリオにおいて90%以上の検出精度を達成し,ベースラインよりも優れていた。

論文の概要: Paladin: Defending LLM-enabled Phishing Emails with a New Trigger-Tag Paradigm

関連論文リスト