Related papers: A Multi-Turn Framework for Evaluating AI Misuse in Fraud and Cybercrime Scenarios

A Multi-Turn Framework for Evaluating AI Misuse in Fraud and Cybercrime Scenarios

URL: http://arxiv.org/abs/2602.21831v2
Date: Tue, 03 Mar 2026 17:14:45 GMT
Title: A Multi-Turn Framework for Evaluating AI Misuse in Fraud and Cybercrime Scenarios
Authors: Kimberly T. Mai, Anna Gausen, Magda Dubois, Mona Murad, Bessie O'Dell, Nadine Staes-Polet, Christopher Summerfield, Andrew Strait,
Abstract summary: It is unclear the extent to which current large language models can provide useful information for complex criminal activity.<n>We evaluate whether models provide actionable assistance beyond information typically available on the web, as assessed by domain experts.<n>We found that (1) current large language models provide minimal actionable information for fraud and cybercrime without the use of advanced jailbreaking techniques.
Score: 1.1864532555108382
License: http://creativecommons.org/licenses/by/4.0/
Abstract: AI is increasingly being used to assist fraud and cybercrime. However, it is unclear the extent to which current large language models can provide useful information for complex criminal activity. Working with law enforcement and policy experts, we developed multi-turn evaluations for three fraud and cybercrime scenarios (romance scams, CEO impersonation, and identity theft). Our evaluations focus on text-to-text interactions. In each scenario, we evaluate whether models provide actionable assistance beyond information typically available on the web, as assessed by domain experts. We do so in ways designed to resemble real-world misuse, such as breaking down requests for fraud into a sequence of seemingly benign queries. We found that (1) current large language models provide minimal actionable information for fraud and cybercrime without the use of advanced jailbreaking techniques, (2) model safeguards have significant impact on the provision of information, with the two open-weight large language models fine-tuned to remove safety guardrails providing the most actionable and useful responses, and (3) decomposing requests into benign-seeming queries elicited more assistance than explicitly malicious framing or basic system-level jailbreaks. Overall, the results suggest that current text-generation models provide relatively minimal uplift for fraud and cybercrime through information provision, without extensive effort to circumvent safeguards. This work contributes a reproducible, expert-grounded framework for tracking how these risks may evolve with time as models grow more capable and adversaries adapt.

Related papers

Agents of Chaos [50.53354213047402]
We report an exploratory red-teaming study of autonomous language-model-powered agents deployed in a live laboratory environment.<n>Twenty AI researchers interacted with the agents under benign and adversarial conditions.<n>Our findings establish the existence of security-, privacy-, and governance-relevant vulnerabilities in realistic deployment settings.
arXiv Detail & Related papers (2026-02-23T16:28:48Z)
VirtualCrime: Evaluating Criminal Potential of Large Language Models via Sandbox Simulation [10.613890248478189]
Large language models (LLMs) have shown strong capabilities in multi-step decision-making, planning and actions.<n>It is concerning whether their strong problem-solving abilities may be misused for crimes.<n>We propose VirtualCrime, a sandbox simulation framework based on a three-agent system to evaluate the criminal capabilities of models.
arXiv Detail & Related papers (2026-01-20T13:59:53Z)
PACEbench: A Framework for Evaluating Practical AI Cyber-Exploitation Capabilities [42.61805002268063]
We introduce PACEbench, a practical AI cyber-exploitation benchmark.<n>PACEbench comprises four scenarios spanning single, blended, chained, and defense vulnerability exploitations.<n>We propose PACEagent, a novel agent that emulates human penetration testers by supporting multi-phase reconnaissance, analysis, and exploitation.
arXiv Detail & Related papers (2025-10-13T17:50:25Z)
An Unsupervised Learning Approach For A Reliable Profiling Of Cyber Threat Actors Reported Globally Based On Complete Contextual Information Of Cyber Attacks [0.0]
It is critical to promptly recognize cyberattacks and establish strong defense mechanisms against them.<n>Creating a profile of cyber threat actors based on their traits or patterns of behavior can help to create effective defenses against cyberattacks in advance.<n>In this paper, an unsupervised efficient agglomerative hierarchal clustering technique is proposed for profiling cybercriminal groups.
arXiv Detail & Related papers (2025-09-15T08:32:59Z)
A Systematic Survey of Model Extraction Attacks and Defenses: State-of-the-Art and Perspectives [65.3369988566853]
Recent studies have demonstrated that adversaries can replicate a target model's functionality.<n>Model Extraction Attacks pose threats to intellectual property, privacy, and system security.<n>We propose a novel taxonomy that classifies MEAs according to attack mechanisms, defense approaches, and computing environments.
arXiv Detail & Related papers (2025-08-20T19:49:59Z)
The Application of Transformer-Based Models for Predicting Consequences of Cyber Attacks [0.4604003661048266]
Threat Modeling can provide critical support to cybersecurity professionals, enabling them to take timely action and allocate resources that could be used elsewhere.<n>Recently, there has been a pressing need for automated methods to assess attack descriptions and forecast the future consequences of cyberattacks.<n>This study examines how Natural Language Processing (NLP) and deep learning can be applied to analyze the potential impact of cyberattacks.
arXiv Detail & Related papers (2025-08-18T15:46:36Z)
A Proposal for Evaluating the Operational Risk for ChatBots based on Large Language Models [39.58317527488534]
We propose a novel, instrumented risk-assessment metric that simultaneously evaluates potential threats to three key stakeholders.<n>To validate our metric, we leverage Garak, an open-source framework for vulnerability testing.<n>Results underscore the importance of multi-dimensional risk assessments in operationalizing secure, reliable AI-driven conversational systems.
arXiv Detail & Related papers (2025-05-07T20:26:45Z)
T2VShield: Model-Agnostic Jailbreak Defense for Text-to-Video Models [88.63040835652902]
Text to video models are vulnerable to jailbreak attacks, where specially crafted prompts bypass safety mechanisms and lead to the generation of harmful or unsafe content.<n>We propose T2VShield, a comprehensive and model agnostic defense framework designed to protect text to video models from jailbreak threats.<n>Our method systematically analyzes the input, model, and output stages to identify the limitations of existing defenses.
arXiv Detail & Related papers (2025-04-22T01:18:42Z)
OCCULT: Evaluating Large Language Models for Offensive Cyber Operation Capabilities [0.0]
We demonstrate a new approach to assessing AI's progress towards enabling and scaling real-world offensive cyber operations.<n>We detail OCCULT, a lightweight operational evaluation framework that allows cyber security experts to contribute to rigorous and repeatable measurement.<n>We find that there has been significant recent advancement in the risks of AI being used to scale realistic cyber threats.
arXiv Detail & Related papers (2025-02-18T19:33:14Z)
Reformulation is All You Need: Addressing Malicious Text Features in DNNs [53.45564571192014]
We propose a unified and adaptive defense framework that is effective against both adversarial and backdoor attacks.<n>Our framework outperforms existing sample-oriented defense baselines across a diverse range of malicious textual features.
arXiv Detail & Related papers (2025-02-02T03:39:43Z)
In-Context Experience Replay Facilitates Safety Red-Teaming of Text-to-Image Diffusion Models [104.94706600050557]
Text-to-image (T2I) models have shown remarkable progress, but their potential to generate harmful content remains a critical concern in the ML community.<n>We propose ICER, a novel red-teaming framework that generates interpretable and semantic meaningful problematic prompts.<n>Our work provides crucial insights for developing more robust safety mechanisms in T2I systems.
arXiv Detail & Related papers (2024-11-25T04:17:24Z)
Fortifying Ethical Boundaries in AI: Advanced Strategies for Enhancing Security in Large Language Models [3.9490749767170636]
Large language models (LLMs) have revolutionized text generation, translation, and question-answering tasks. Despite their widespread use, LLMs present challenges such as ethical dilemmas when models are compelled to respond inappropriately. This paper addresses these challenges by introducing a multi-pronged approach that includes: 1) filtering sensitive vocabulary from user input to prevent unethical responses; 2) detecting role-playing to halt interactions that could lead to 'prison break' scenarios; and 4) extending these methodologies to various LLM derivatives like Multi-Model Large Language Models (MLLMs)
arXiv Detail & Related papers (2024-01-27T08:09:33Z)
Grounded Decoding: Guiding Text Generation with Grounded Models for Embodied Agents [111.15288256221764]
Grounded-decoding project aims to solve complex, long-horizon tasks in a robotic setting by leveraging the knowledge of both models. We frame this as a problem similar to probabilistic filtering: decode a sequence that both has high probability under the language model and high probability under a set of grounded model objectives. We demonstrate how such grounded models can be obtained across three simulation and real-world domains, and that the proposed decoding strategy is able to solve complex, long-horizon tasks in a robotic setting by leveraging the knowledge of both models.
arXiv Detail & Related papers (2023-03-01T22:58:50Z)
Detecting Cross-Modal Inconsistency to Defend Against Neural Fake News [57.9843300852526]
We introduce the more realistic and challenging task of defending against machine-generated news that also includes images and captions. To identify the possible weaknesses that adversaries can exploit, we create a NeuralNews dataset composed of 4 different types of generated articles. In addition to the valuable insights gleaned from our user study experiments, we provide a relatively effective approach based on detecting visual-semantic inconsistencies.
arXiv Detail & Related papers (2020-09-16T14:13:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.