Related papers: Exposing LLM Vulnerabilities: Adversarial Scam Detection and Performance

Exposing LLM Vulnerabilities: Adversarial Scam Detection and Performance

URL: http://arxiv.org/abs/2412.00621v1
Date: Sun, 01 Dec 2024 00:13:28 GMT
Title: Exposing LLM Vulnerabilities: Adversarial Scam Detection and Performance
Authors: Chen-Wei Chang, Shailik Sarkar, Shutonu Mitra, Qi Zhang, Hossein Salemi, Hemant Purohit, Fengxiu Zhang, Michin Hong, Jin-Hee Cho, Chang-Tien Lu,
Abstract summary: This paper investigates the vulnerabilities of Large Language Models (LLMs) when facing adversarial scam messages for the task of scam detection.<n>We created a comprehensive dataset with fine-grained labels of scam messages, including both original and adversarial scam messages.<n>Our analysis showed how adversarial examples took advantage of vulnerabilities of a LLM, leading to high misclassification rate.
Score: 16.9071617169937
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Can we trust Large Language Models (LLMs) to accurately predict scam? This paper investigates the vulnerabilities of LLMs when facing adversarial scam messages for the task of scam detection. We addressed this issue by creating a comprehensive dataset with fine-grained labels of scam messages, including both original and adversarial scam messages. The dataset extended traditional binary classes for the scam detection task into more nuanced scam types. Our analysis showed how adversarial examples took advantage of vulnerabilities of a LLM, leading to high misclassification rate. We evaluated the performance of LLMs on these adversarial scam messages and proposed strategies to improve their robustness.

Related papers

ScamFerret: Detecting Scam Websites Autonomously with Large Language Models [2.6217304977339473]
ScamFerret is an innovative agent system employing a large language model (LLM) to autonomously collect and analyze data from a given URL to determine whether it is a scam. Our evaluation demonstrated that ScamFerret achieves 0.972 accuracy in classifying four scam types in English and 0.993 accuracy in classifying online shopping websites across three different languages.
arXiv Detail & Related papers (2025-02-14T12:16:38Z)
Adversarial Reasoning at Jailbreaking Time [49.70772424278124]
We develop an adversarial reasoning approach to automatic jailbreaking via test-time computation. Our approach introduces a new paradigm in understanding LLM vulnerabilities, laying the foundation for the development of more robust and trustworthy AI systems.
arXiv Detail & Related papers (2025-02-03T18:59:01Z)
Distinguishing Scams and Fraud with Ensemble Learning [0.8192907805418583]
The Consumer Financial Protection Bureau's complaints database is a rich data source for evaluating LLM performance on user scam queries. We developed an ensemble approach to distinguishing scam and fraud CFPB complaints.
arXiv Detail & Related papers (2024-12-11T18:07:18Z)
Can LLMs be Scammed? A Baseline Measurement Study [0.0873811641236639]
Large Language Models' (LLMs') vulnerability to a variety of scam tactics is systematically assessed. First, we incorporate 37 well-defined base scam scenarios reflecting the diverse scam categories identified by FINRA taxonomy. Second, we utilize representative proprietary (GPT-3.5, GPT-4) and open-source (Llama) models to analyze their performance in scam detection. Third, our research provides critical insights into which scam tactics are most effective against LLMs and how varying persona traits and persuasive techniques influence these vulnerabilities.
arXiv Detail & Related papers (2024-10-14T05:22:27Z)
Combating Phone Scams with LLM-based Detection: Where Do We Stand? [1.8979188847659796]
This research explores the potential of large language models (LLMs) to provide detection of fraudulent phone calls. LLMs-based detectors can identify potential scams as they occur, offering immediate protection to users.
arXiv Detail & Related papers (2024-09-18T02:14:30Z)
Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs [60.32717556756674]
This paper introduces a systematic evaluation framework to assess Large Language Models in detecting cryptographic misuses. Our in-depth analysis of 11,940 LLM-generated reports highlights that the inherent instabilities in LLMs can lead to over half of the reports being false positives. The optimized approach achieves a remarkable detection rate of nearly 90%, surpassing traditional methods and uncovering previously unknown misuses in established benchmarks.
arXiv Detail & Related papers (2024-07-23T15:31:26Z)
CLAMBER: A Benchmark of Identifying and Clarifying Ambiguous Information Needs in Large Language Models [60.59638232596912]
We introduce CLAMBER, a benchmark for evaluating large language models (LLMs) Building upon the taxonomy, we construct 12K high-quality data to assess the strengths, weaknesses, and potential risks of various off-the-shelf LLMs. Our findings indicate the limited practical utility of current LLMs in identifying and clarifying ambiguous user queries.
arXiv Detail & Related papers (2024-05-20T14:34:01Z)
Understanding Privacy Risks of Embeddings Induced by Large Language Models [75.96257812857554]
Large language models show early signs of artificial general intelligence but struggle with hallucinations. One promising solution is to store external knowledge as embeddings, aiding LLMs in retrieval-augmented generation. Recent studies experimentally showed that the original text can be partially reconstructed from text embeddings by pre-trained language models.
arXiv Detail & Related papers (2024-04-25T13:10:48Z)
Detecting Scams Using Large Language Models [19.7220607313348]
Large Language Models (LLMs) have gained prominence in various applications, including security. This paper explores the utility of LLMs in scam detection, a critical aspect of cybersecurity. We propose a novel use case for LLMs to identify scams, such as phishing, advance fee fraud, and romance scams.
arXiv Detail & Related papers (2024-02-05T16:13:54Z)
An Improved Transformer-based Model for Detecting Phishing, Spam, and Ham: A Large Language Model Approach [0.0]
We present IPSDM, our model based on fine-tuning the BERT family of models to specifically detect phishing and spam email. We demonstrate our fine-tuned version, IPSDM, is able to better classify emails in both unbalanced and balanced datasets.
arXiv Detail & Related papers (2023-11-01T18:41:50Z)
SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks [99.23352758320945]
We propose SmoothLLM, the first algorithm designed to mitigate jailbreaking attacks on large language models (LLMs) Based on our finding that adversarially-generated prompts are brittle to character-level changes, our defense first randomly perturbs multiple copies of a given input prompt, and then aggregates the corresponding predictions to detect adversarial inputs.
arXiv Detail & Related papers (2023-10-05T17:01:53Z)
Are Large Language Models Really Robust to Word-Level Perturbations? [68.60618778027694]
We propose a novel rational evaluation approach that leverages pre-trained reward models as diagnostic tools. Longer conversations manifest the comprehensive grasp of language models in terms of their proficiency in understanding questions. Our results demonstrate that LLMs frequently exhibit vulnerability to word-level perturbations that are commonplace in daily language usage.
arXiv Detail & Related papers (2023-09-20T09:23:46Z)
Red Teaming Language Model Detectors with Language Models [114.36392560711022]
Large language models (LLMs) present significant safety and ethical risks if exploited by malicious users. Recent works have proposed algorithms to detect LLM-generated text and protect LLMs. We study two types of attack strategies: 1) replacing certain words in an LLM's output with their synonyms given the context; 2) automatically searching for an instructional prompt to alter the writing style of the generation.
arXiv Detail & Related papers (2023-05-31T10:08:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.