Fugu-MT 論文翻訳(概要): Evaluating Large Language Models for Phishing Detection, Self-Consistency, Faithfulness, and Explainability

論文の概要: Evaluating Large Language Models for Phishing Detection, Self-Consistency, Faithfulness, and Explainability

arxiv url: http://arxiv.org/abs/2506.13746v1
Date: Mon, 16 Jun 2025 17:54:28 GMT
ステータス: 翻訳完了
システム内更新日: 2025-06-17 17:28:49.197058
Title: Evaluating Large Language Models for Phishing Detection, Self-Consistency, Faithfulness, and Explainability
Title（参考訳）: フィッシング検出, 自己整合性, 忠実度, 説明可能性のための大規模言語モデルの評価
Authors: Shova Kuikel, Aritran Piplai, Palvi Aggarwal,
Abstract要約: 大規模言語モデル(LLM)は、ドメイン固有のフィッシング分類タスクを改善するための有望な方向性と可能性を示している。 LLMはフィッシングメールを正確に分類するだけでなく、予測に確実に適合し、内部に一貫性のある説明を生成することができるのか? BERT、Llamaモデル、Wizardなど、微調整されたトランスフォーマーベースのモデルを使って、ドメインの関連性を改善し、特定の区別をフィッシングするように調整しています。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Phishing attacks remain one of the most prevalent and persistent cybersecurity threat with attackers continuously evolving and intensifying tactics to evade the general detection system. Despite significant advances in artificial intelligence and machine learning, faithfully reproducing the interpretable reasoning with classification and explainability that underpin phishing judgments remains challenging. Due to recent advancement in Natural Language Processing, Large Language Models (LLMs) show a promising direction and potential for improving domain specific phishing classification tasks. However, enhancing the reliability and robustness of classification models requires not only accurate predictions from LLMs but also consistent and trustworthy explanations aligning with those predictions. Therefore, a key question remains: can LLMs not only classify phishing emails accurately but also generate explanations that are reliably aligned with their predictions and internally self-consistent? To answer these questions, we have fine-tuned transformer based models, including BERT, Llama models, and Wizard, to improve domain relevance and make them more tailored to phishing specific distinctions, using Binary Sequence Classification, Contrastive Learning (CL) and Direct Preference Optimization (DPO). To that end, we examined their performance in phishing classification and explainability by applying the ConsistenCy measure based on SHAPley values (CC SHAP), which measures prediction explanation token alignment to test the model's internal faithfulness and consistency and uncover the rationale behind its predictions and reasoning. Overall, our findings show that Llama models exhibit stronger prediction explanation token alignment with higher CC SHAP scores despite lacking reliable decision making accuracy, whereas Wizard achieves better prediction accuracy but lower CC SHAP scores.
Abstract（参考訳）: フィッシング攻撃は、一般的な検知システムを避けるための戦術を継続的に進化させ、強化する攻撃者にとって、最も一般的で永続的なサイバーセキュリティの脅威の1つだ。人工知能と機械学習の大幅な進歩にもかかわらず、分類と説明可能性によって解釈可能な推論を忠実に再現し、フィッシングの判断を下すことは依然として困難である。近年の自然言語処理の進歩により、Large Language Models (LLMs) は、ドメイン固有のフィッシング分類タスクを改善するための有望な方向性と可能性を示している。しかし、分類モデルの信頼性と堅牢性を高めるには、LSMからの正確な予測だけでなく、それらの予測と一致した一貫性と信頼性のある説明が必要である。したがって、重要な疑問が残る: LLMはフィッシングメールを正確に分類するだけでなく、予測と確実に一致し、内部的に一貫性のある説明を生成することができるか? これらの質問に答えるために、BERT、Llamaモデル、Wizardなどの微調整されたトランスフォーマーモデルがあり、ドメインの関連性を改善し、バイナリシーケンス分類(CL)、コントラスト学習(CL)、直接参照最適化(DPO)を使用して、特定の区別をフィッシングするように調整しています。そこで我々は,モデルの内部の忠実度と一貫性を検証し,その予測と推論の背後にある理論的根拠を明らかにするために,SHAPley値(CC SHAP)に基づくConsistenCy測度を適用して,フィッシング分類と説明可能性の評価を行った。以上の結果から,Llamaモデルでは信頼性に欠けるCC SHAPスコアと高いCC SHAPスコアとの相関が強く,Wizardでは予測精度は向上するが,CC SHAPスコアは低下することがわかった。

論文の概要: Evaluating Large Language Models for Phishing Detection, Self-Consistency, Faithfulness, and Explainability

関連論文リスト