Related papers: Large Multimodal Agents for Accurate Phishing Detection with Enhanced Token Optimization and Cost Reduction

Large Multimodal Agents for Accurate Phishing Detection with Enhanced Token Optimization and Cost Reduction

URL: http://arxiv.org/abs/2412.02301v1
Date: Tue, 03 Dec 2024 09:13:52 GMT
Title: Large Multimodal Agents for Accurate Phishing Detection with Enhanced Token Optimization and Cost Reduction
Authors: Fouad Trad, Ali Chehab,
Abstract summary: This paper explores the use of large multimodal agents, specifically Gemini 1.5 Flash and GPT-4o mini, to analyze both URLs and webpage screenshots via APIs.<n>We propose a two-tiered agentic approach: one agent assesses the URL, and if inconclusive, a second agent evaluates both the URL and the screenshot.<n>Cost analysis shows that with the agentic approach, GPT-4o mini can process about 4.2 times as many websites per $100 compared to the multimodal approach.
Score: 2.8161155726745237
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With the rise of sophisticated phishing attacks, there is a growing need for effective and economical detection solutions. This paper explores the use of large multimodal agents, specifically Gemini 1.5 Flash and GPT-4o mini, to analyze both URLs and webpage screenshots via APIs, thus avoiding the complexities of training and maintaining AI systems. Our findings indicate that integrating these two data types substantially enhances detection performance over using either type alone. However, API usage incurs costs per query that depend on the number of input and output tokens. To address this, we propose a two-tiered agentic approach: initially, one agent assesses the URL, and if inconclusive, a second agent evaluates both the URL and the screenshot. This method not only maintains robust detection performance but also significantly reduces API costs by minimizing unnecessary multi-input queries. Cost analysis shows that with the agentic approach, GPT-4o mini can process about 4.2 times as many websites per $100 compared to the multimodal approach (107,440 vs. 25,626), and Gemini 1.5 Flash can process about 2.6 times more websites (2,232,142 vs. 862,068). These findings underscore the significant economic benefits of the agentic approach over the multimodal method, providing a viable solution for organizations aiming to leverage advanced AI for phishing detection while controlling expenses.

Related papers

Runaway is Ashamed, But Helpful: On the Early-Exit Behavior of Large Language Model-based Agents in Embodied Environments [55.044159987218436]
Large language models (LLMs) have demonstrated strong planning and decision-making capabilities in complex embodied environments.<n>We take a first step toward exploring the early-exit behavior for LLM-based agents.
arXiv Detail & Related papers (2025-05-23T08:23:36Z)
Search Wisely: Mitigating Sub-optimal Agentic Searches By Reducing Uncertainty [15.97218000282262]
Agentic Retrieval-Augmented Generation (RAG) systems enhance Large Language Models (LLMs) by enabling dynamic, multi-step reasoning and information retrieval.<n>These systems often exhibit sub-optimal search behaviors like over-search (retrieving redundant information) and under-search (failing to retrieve necessary information)<n>This work formally defines and quantifies these behaviors, revealing their prevalence across multiple QA datasets and agentic RAG systems.
arXiv Detail & Related papers (2025-05-22T20:57:56Z)
Phishing URL Detection using Bi-LSTM [0.0]
This paper proposes a deep learning-based approach to classify URLs into four categories: benign, phishing, defacement, and malware. Experimental results on a dataset comprising over 650,000 URLs demonstrate the model's effectiveness, achieving 97% accuracy and significant improvements over traditional techniques.
arXiv Detail & Related papers (2025-04-29T00:55:01Z)
Multi-stage Large Language Model Pipelines Can Outperform GPT-4o in Relevance Assessment [6.947361774195549]
We propose a modular classification pipeline that divides the relevance assessment task into multiple stages. One of our approaches showed an 18.4% Krippendorff's $alpha$ accuracy increase over OpenAI's GPT-4o mini.
arXiv Detail & Related papers (2025-01-24T07:33:39Z)
Cut the Crap: An Economical Communication Pipeline for LLM-based Multi-Agent Systems [42.137278756052595]
$texttAgentPrune$ can seamlessly integrate into mainstream multi-agent systems. textbf(I) integrates seamlessly into existing multi-agent frameworks with $28.1%sim72.8%downarrow$ token reduction. textbf(III) successfully defend against two types of agent-based adversarial attacks with $3.5%sim10.8%uparrow$ performance boost.
arXiv Detail & Related papers (2024-10-03T14:14:31Z)
On the Resilience of Multi-Agent Systems with Malicious Agents [58.79302663733702]
This paper investigates what is the resilience of multi-agent system structures under malicious agents. We devise two methods, AutoTransform and AutoInject, to transform any agent into a malicious one. We show that two defense methods, introducing a mechanism for each agent to challenge others' outputs, or an additional agent to review and correct messages, can enhance system resilience.
arXiv Detail & Related papers (2024-08-02T03:25:20Z)
PhishGuard: A Convolutional Neural Network Based Model for Detecting Phishing URLs with Explainability Analysis [1.102674168371806]
Phishing URL identification is the best way to address the problem. Various machine learning and deep learning methods have been proposed to automate the detection of phishing URLs. We propose a 1D Convolutional Neural Network (CNN) and trained the model with extensive features and a substantial amount of data.
arXiv Detail & Related papers (2024-04-27T17:13:49Z)
Securing Transactions: A Hybrid Dependable Ensemble Machine Learning Model using IHT-LR and Grid Search [2.4374097382908477]
We introduce a state-of-the-art hybrid ensemble (ENS) Machine learning (ML) model that intelligently combines multiple algorithms to enhance fraud identification. Our experiments are conducted on a publicly available credit card dataset comprising 284,807 transactions. The proposed model achieves impressive accuracy rates of 99.66%, 99.73%, 98.56%, and 99.79%, and a perfect 100% for the DT, RF, KNN, and ENS models, respectively.
arXiv Detail & Related papers (2024-02-22T09:01:42Z)
Efficient Verification-Based Face Identification [50.616875565173274]
We study the problem of performing face verification with an efficient neural model $f$. Our model leads to a substantially small $f$ requiring only 23k parameters and 5M floating point operations (FLOPS) We use six face verification datasets to demonstrate that our method is on par or better than state-of-the-art models.
arXiv Detail & Related papers (2023-12-20T18:08:02Z)
Adversarially Robust Deep Learning with Optimal-Transport-Regularized Divergences [12.1942837946862]
We introduce the $ARMOR_D$ methods as novel approaches to enhancing the adversarial robustness of deep learning models. We demonstrate the effectiveness of our method on malware detection and image recognition applications.
arXiv Detail & Related papers (2023-09-07T15:41:45Z)
Efficient Decoder-free Object Detection with Transformers [75.00499377197475]
Vision transformers (ViTs) are changing the landscape of object detection approaches. We propose a decoder-free fully transformer-based (DFFT) object detector. DFFT_SMALL achieves high efficiency in both training and inference stages.
arXiv Detail & Related papers (2022-06-14T13:22:19Z)
Generative Adversarial Network-Driven Detection of Adversarial Tasks in Mobile Crowdsensing [5.675436513661266]
Crowdsensing systems are vulnerable to various attacks as they build on non-dedicated and ubiquitous properties. Previous works suggest that GAN-based attacks exhibit more crucial devastation than empirically designed attack samples. This paper aims to detect intelligently designed illegitimate sensing service requests by integrating a GAN-based model.
arXiv Detail & Related papers (2022-02-16T00:23:25Z)
Decoupled and Memory-Reinforced Networks: Towards Effective Feature Learning for One-Step Person Search [65.51181219410763]
One-step methods have been developed to handle pedestrian detection and identification sub-tasks using a single network. There are two major challenges in the current one-step approaches. We propose a decoupled and memory-reinforced network (DMRNet) to overcome these problems.
arXiv Detail & Related papers (2021-02-22T06:19:45Z)
Coded Stochastic ADMM for Decentralized Consensus Optimization with Edge Computing [113.52575069030192]
Big data, including applications with high security requirements, are often collected and stored on multiple heterogeneous devices, such as mobile devices, drones and vehicles. Due to the limitations of communication costs and security requirements, it is of paramount importance to extract information in a decentralized manner instead of aggregating data to a fusion center. We consider the problem of learning model parameters in a multi-agent system with data locally processed via distributed edge nodes. A class of mini-batch alternating direction method of multipliers (ADMM) algorithms is explored to develop the distributed learning model.
arXiv Detail & Related papers (2020-10-02T10:41:59Z)
SADet: Learning An Efficient and Accurate Pedestrian Detector [68.66857832440897]
This paper proposes a series of systematic optimization strategies for the detection pipeline of one-stage detector. It forms a single shot anchor-based detector (SADet) for efficient and accurate pedestrian detection. Though structurally simple, it presents state-of-the-art result and real-time speed of $20$ FPS for VGA-resolution images.
arXiv Detail & Related papers (2020-07-26T12:32:38Z)
A Provably Efficient Sample Collection Strategy for Reinforcement Learning [123.69175280309226]
One of the challenges in online reinforcement learning (RL) is that the agent needs to trade off the exploration of the environment and the exploitation of the samples to optimize its behavior. We propose to tackle the exploration-exploitation problem following a decoupled approach composed of: 1) An "objective-specific" algorithm that prescribes how many samples to collect at which states, as if it has access to a generative model (i.e., sparse simulator of the environment); 2) An "objective-agnostic" sample collection responsible for generating the prescribed samples as fast as possible.
arXiv Detail & Related papers (2020-07-13T15:17:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.