Related papers: Spam-T5: Benchmarking Large Language Models for Few-Shot Email Spam Detection

Spam-T5: Benchmarking Large Language Models for Few-Shot Email Spam Detection

URL: http://arxiv.org/abs/2304.01238v3
Date: Sun, 7 May 2023 10:57:51 GMT
Title: Spam-T5: Benchmarking Large Language Models for Few-Shot Email Spam Detection
Authors: Maxime Labonne and Sean Moran
Abstract summary: This paper investigates the effectiveness of large language models (LLMs) in email spam detection. We compare prominent models from three distinct families: BERT-like, Sentence Transformers, and Seq2Seq. We assess the performance of these models across four public datasets.
Score: 3.3504365823045044
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: This paper investigates the effectiveness of large language models (LLMs) in email spam detection by comparing prominent models from three distinct families: BERT-like, Sentence Transformers, and Seq2Seq. Additionally, we examine well-established machine learning techniques for spam detection, such as Na\"ive Bayes and LightGBM, as baseline methods. We assess the performance of these models across four public datasets, utilizing different numbers of training samples (full training set and few-shot settings). Our findings reveal that, in the majority of cases, LLMs surpass the performance of the popular baseline techniques, particularly in few-shot scenarios. This adaptability renders LLMs uniquely suited to spam detection tasks, where labeled samples are limited in number and models require frequent updates. Additionally, we introduce Spam-T5, a Flan-T5 model that has been specifically adapted and fine-tuned for the purpose of detecting email spam. Our results demonstrate that Spam-T5 surpasses baseline models and other LLMs in the majority of scenarios, particularly when there are a limited number of training samples available. Our code is publicly available at https://github.com/jpmorganchase/emailspamdetection.

Related papers

Improving Phishing Email Detection Performance of Small Large Language Models [5.209583971923267]
Large language models(LLMs) have demonstrated remarkable performance on many natural language processing(NLP) tasks. However, well-performing LLMs typically contain billions or even tens of billions of parameters, requiring enormous computational resources.
arXiv Detail & Related papers (2025-04-29T14:07:06Z)
An Investigation of Large Language Models and Their Vulnerabilities in Spam Detection [7.550686419077825]
This project studies new spam detection systems that leverage Large Language Models (LLMs) fine-tuned with spam datasets. This experimentation employs two LLM models of GPT2 and BERT and three spam datasets of Enron, LingSpam, and SMSspamCollection. The results show that, while they can function as effective spam filters, the LLM models are susceptible to the adversarial and data poisoning attacks.
arXiv Detail & Related papers (2025-04-14T00:30:27Z)
Idiosyncrasies in Large Language Models [54.26923012617675]
We unveil and study idiosyncrasies in Large Language Models (LLMs) We find that fine-tuning existing text embedding models on LLM-generated texts yields excellent classification accuracy. We leverage LLM as judges to generate detailed, open-ended descriptions of each model's idiosyncrasies.
arXiv Detail & Related papers (2025-02-17T18:59:02Z)
Training on the Benchmark Is Not All You Need [52.01920740114261]
We propose a simple and effective data leakage detection method based on the contents of multiple-choice options. Our method is able to work under black-box conditions without access to model training data or weights. We evaluate the degree of data leakage of 31 mainstream open-source LLMs on four benchmark datasets.
arXiv Detail & Related papers (2024-09-03T11:09:44Z)
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling [81.34900892130929]
We explore inference compute as another axis for scaling by increasing the number of generated samples. In domains like coding and formal proofs, where all answers can be automatically verified, these increases in coverage directly translate into improved performance. We find that identifying correct samples out of many generations remains an important direction for future research in domains without automatic verifiers.
arXiv Detail & Related papers (2024-07-31T17:57:25Z)
Spectra: Surprising Effectiveness of Pretraining Ternary Language Models at Scale [16.865532646589987]
This paper investigates the pretraining of low-bitwidth models specifically Ternary Language Models (TriLMs) as an alternative to traditional floating-point models (FloatLMs) and their post-training quantized versions (QuantLMs) We present Spectra LLM suite, the first open suite of LLMs spanning multiple bit-widths, including FloatLMs, QuantLMs, and TriLMs, ranging from 99M to 3.9B parameters trained on 300B tokens.
arXiv Detail & Related papers (2024-07-17T05:53:20Z)
Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph [83.90988015005934]
Uncertainty quantification is a key element of machine learning applications. We introduce a novel benchmark that implements a collection of state-of-the-art UQ baselines. We conduct a large-scale empirical investigation of UQ and normalization techniques across eleven tasks, identifying the most effective approaches.
arXiv Detail & Related papers (2024-06-21T20:06:31Z)
Aligning Language Models with Demonstrated Feedback [58.834937450242975]
Demonstration ITerated Task Optimization (DITTO) directly aligns language model outputs to a user's demonstrated behaviors. We evaluate DITTO's ability to learn fine-grained style and task alignment across domains such as news articles, emails, and blog posts.
arXiv Detail & Related papers (2024-06-02T23:13:56Z)
Zero-Shot Spam Email Classification Using Pre-trained Large Language Models [0.0]
This paper investigates the application of pre-trained large language models (LLMs) for spam email classification using zero-shot prompting. We evaluate the performance of both open-source (Flan-T5) and proprietary LLMs (ChatGPT, GPT-4) on the well-known SpamAssassin dataset.
arXiv Detail & Related papers (2024-05-24T20:55:49Z)
TAT-LLM: A Specialized Language Model for Discrete Reasoning over Tabular and Textual Data [73.29220562541204]
We consider harnessing the amazing power of language models (LLMs) to solve our task. We develop a TAT-LLM language model by fine-tuning LLaMA 2 with the training data generated automatically from existing expert-annotated datasets.
arXiv Detail & Related papers (2024-01-24T04:28:50Z)
Self-supervised learning of multi-omics embeddings in the low-label, high-data regime [0.0]
Contrastive, self-supervised learning (SSL) is used to train a model that predicts cancer type from unimodal, mRNA or RPPA expression data. A late-fusion model is proposed, where each omics is passed through its own sub-network, the outputs of which are averaged and passed to the pretraining or downstream objective function. Multi-modal pretraining is shown to improve predictions from a single omics, and we argue that this is useful for datasets with many unlabelled multi-modal samples, but few labelled samples.
arXiv Detail & Related papers (2023-11-16T15:32:22Z)
CodeGen2: Lessons for Training LLMs on Programming and Natural Languages [116.74407069443895]
We unify encoder and decoder-based models into a single prefix-LM. For learning methods, we explore the claim of a "free lunch" hypothesis. For data distributions, the effect of a mixture distribution and multi-epoch training of programming and natural languages on model performance is explored.
arXiv Detail & Related papers (2023-05-03T17:55:25Z)
Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes [91.58845026796149]
We introduce Distilling step-by-step, a new mechanism that trains small models that outperform large language models. We present three findings across 4 NLP benchmarks.
arXiv Detail & Related papers (2023-05-03T17:50:56Z)
AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators [98.11286353828525]
GPT-3.5 series models have demonstrated remarkable few-shot and zero-shot ability across various NLP tasks. We propose AnnoLLM, which adopts a two-step approach, explain-then-annotate. We build the first conversation-based information retrieval dataset employing AnnoLLM.
arXiv Detail & Related papers (2023-03-29T17:03:21Z)
Few-shot learning approaches for classifying low resource domain specific software requirements [1.1470070927586016]
Few-shot learning is a type of deep learning that uses only a few annotated samples. Our experiments focus on classifying BOSCH automotive domain textual software requirements into 3 categories. While SciBERT and DeBERTa based models tend to be the most accurate at 15 training samples, their performance improvement scales minimally as the number of annotated samples is increased to 50 in comparison to Siamese and T5 based models.
arXiv Detail & Related papers (2023-02-14T10:19:23Z)
Frustratingly Simple Pretraining Alternatives to Masked Language Modeling [10.732163031244651]
Masked language modeling (MLM) is widely used in natural language processing for learning text representations. In this paper, we explore five simple pretraining objectives based on token-level classification tasks as replacements of representations.
arXiv Detail & Related papers (2021-09-04T08:52:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.