To Ensemble or Not: Assessing Majority Voting Strategies for Phishing Detection with Large Language Models
- URL: http://arxiv.org/abs/2412.00166v1
- Date: Fri, 29 Nov 2024 14:42:23 GMT
- Title: To Ensemble or Not: Assessing Majority Voting Strategies for Phishing Detection with Large Language Models
- Authors: Fouad Trad, Ali Chehab,
- Abstract summary: This study investigates three majority voting strategies for text classification, focusing on phishing URL detection.
The strategies are a prompt-based ensemble, a model-based ensemble, and a hybrid ensemble.
Our analysis shows that ensemble strategies are most suited in cases where individual components exhibit equivalent performance levels.
- Score: 2.8161155726745237
- License:
- Abstract: The effectiveness of Large Language Models (LLMs) significantly relies on the quality of the prompts they receive. However, even when processing identical prompts, LLMs can yield varying outcomes due to differences in their training processes. To leverage the collective intelligence of multiple LLMs and enhance their performance, this study investigates three majority voting strategies for text classification, focusing on phishing URL detection. The strategies are: (1) a prompt-based ensemble, which utilizes majority voting across the responses generated by a single LLM to various prompts; (2) a model-based ensemble, which entails aggregating responses from multiple LLMs to a single prompt; and (3) a hybrid ensemble, which combines the two methods by sending different prompts to multiple LLMs and then aggregating their responses. Our analysis shows that ensemble strategies are most suited in cases where individual components exhibit equivalent performance levels. However, when there is a significant discrepancy in individual performance, the effectiveness of the ensemble method may not exceed that of the highest-performing single LLM or prompt. In such instances, opting for ensemble techniques is not recommended.
Related papers
- Don't Just Demo, Teach Me the Principles: A Principle-Based Multi-Agent Prompting Strategy for Text Classification [4.811763060654019]
We present PRINCIPLE-BASED PROMPTING, a simple but effective multi-agent prompting strategy for text classification.
Our approach achieves substantial performance gains (1.55% - 19.37%) over zero-shot prompting on macro-F1 score.
Our multi-agent PRINCIPLE-BASED PROMPTING approach also shows on-par or better performance compared to demonstration-based few-shot prompting approaches.
arXiv Detail & Related papers (2025-02-11T01:10:13Z) - DFPE: A Diverse Fingerprint Ensemble for Enhancing LLM Performance [11.753349115726952]
We propose a novel ensemble method - Diverse Fingerprint Ensemble (DFPE)
Our approach involves: (1) clustering models based on response "fingerprints" patterns, (2) applying a quantile-based filtering mechanism, and (3) assigning adaptive weights to remaining models.
In experiments on the Massive Multitask Language Understanding (MMLU) benchmark, DFPE outperforms the best single model by 3% overall accuracy and 5% in discipline-level accuracy.
arXiv Detail & Related papers (2025-01-29T08:44:45Z) - Multi-LLM Text Summarization [58.74987409988719]
We propose a Multi-LLM summarization framework, and investigate two different multi-LLM strategies including centralized and decentralized.
Our framework has two fundamentally important steps at each round of conversation: generation and evaluation.
We find that our multi-LLM summarization approaches significantly outperform the baselines that leverage only a single LLM by up to 3x.
arXiv Detail & Related papers (2024-12-20T01:55:26Z) - QPO: Query-dependent Prompt Optimization via Multi-Loop Offline Reinforcement Learning [58.767866109043055]
We introduce Query-dependent Prompt Optimization (QPO), which iteratively fine-tune a small pretrained language model to generate optimal prompts tailored to the input queries.
We derive insights from offline prompting demonstration data, which already exists in large quantities as a by-product of benchmarking diverse prompts on open-sourced tasks.
Experiments on various LLM scales and diverse NLP and math tasks demonstrate the efficacy and cost-efficiency of our method in both zero-shot and few-shot scenarios.
arXiv Detail & Related papers (2024-08-20T03:06:48Z) - SelectLLM: Query-Aware Efficient Selection Algorithm for Large Language Models [8.558834738072363]
Large language models (LLMs) have seen widespread adoption due to their remarkable performance across various applications.
These individual LLMs show limitations in generalization and performance on complex tasks due to inherent training biases, model size constraints, and the quality or diversity of pre-training datasets.
We introduce SelectLLM, which efficiently directs input queries to the most suitable subset of LLMs from a large pool.
arXiv Detail & Related papers (2024-08-16T06:11:21Z) - Large Language Models Know What Makes Exemplary Contexts [42.90814615222177]
In-context learning (ICL) has proven to be a significant capability with the advancement of Large Language models (LLMs)
This paper presents a unified framework for LLMs that allows them to self-select influential in-context examples to compose their contexts.
arXiv Detail & Related papers (2024-08-14T12:32:41Z) - MAPO: Boosting Large Language Model Performance with Model-Adaptive Prompt Optimization [73.7779735046424]
We show that different prompts should be adapted to different Large Language Models (LLM) to enhance their capabilities across various downstream tasks in NLP.
We then propose a model-adaptive prompt (MAPO) method that optimize the original prompts for each specific LLM in downstream tasks.
arXiv Detail & Related papers (2024-07-04T18:39:59Z) - Are More LLM Calls All You Need? Towards Scaling Laws of Compound Inference Systems [76.69936664916061]
We study how the number of LM calls affects the performance of Vote and Filter-Vote.
We find, surprisingly, that across multiple language tasks, the performance of both Vote and Filter-Vote can first increase but then decrease as a function of the number of LM calls.
arXiv Detail & Related papers (2024-03-04T19:12:48Z) - Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves [57.974103113675795]
We present a method named Rephrase and Respond' (RaR) which allows Large Language Models to rephrase and expand questions posed by humans.
RaR serves as a simple yet effective prompting method for improving performance.
We show that RaR is complementary to the popular Chain-of-Thought (CoT) methods, both theoretically and empirically.
arXiv Detail & Related papers (2023-11-07T18:43:34Z) - ICL-D3IE: In-Context Learning with Diverse Demonstrations Updating for
Document Information Extraction [56.790794611002106]
Large language models (LLMs) have demonstrated remarkable results in various natural language processing (NLP) tasks with in-context learning.
We propose a simple but effective in-context learning framework called ICL-D3IE.
Specifically, we extract the most difficult and distinct segments from hard training documents as hard demonstrations.
arXiv Detail & Related papers (2023-03-09T06:24:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.