Related papers: Exploiting Jailbreaking Vulnerabilities in Generative AI to Bypass Ethical Safeguards for Facilitating Phishing Attacks

Exploiting Jailbreaking Vulnerabilities in Generative AI to Bypass Ethical Safeguards for Facilitating Phishing Attacks

URL: http://arxiv.org/abs/2507.12185v1
Date: Wed, 16 Jul 2025 12:32:46 GMT
Title: Exploiting Jailbreaking Vulnerabilities in Generative AI to Bypass Ethical Safeguards for Facilitating Phishing Attacks
Authors: Rina Mishra, Gaurav Varshney,
Abstract summary: This study investigates how GenAI powered services can be exploited via jailbreaking techniques to bypass ethical safeguards.<n>We used ChatGPT 4o Mini selected for its accessibility and status as the latest publicly available model as a representative GenAI system.<n>Our findings reveal that the model could successfully guide novice users in executing phishing attacks across various vectors, including web, email, SMS (smishing), and voice (vishing)
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The advent of advanced Generative AI (GenAI) models such as DeepSeek and ChatGPT has significantly reshaped the cybersecurity landscape, introducing both promising opportunities and critical risks. This study investigates how GenAI powered chatbot services can be exploited via jailbreaking techniques to bypass ethical safeguards, enabling the generation of phishing content, recommendation of hacking tools, and orchestration of phishing campaigns. In ethically controlled experiments, we used ChatGPT 4o Mini selected for its accessibility and status as the latest publicly available model at the time of experimentation, as a representative GenAI system. Our findings reveal that the model could successfully guide novice users in executing phishing attacks across various vectors, including web, email, SMS (smishing), and voice (vishing). Unlike automated phishing campaigns that typically follow detectable patterns, these human-guided, AI assisted attacks are capable of evading traditional anti phishing mechanisms, thereby posing a growing security threat. We focused on DeepSeek and ChatGPT due to their widespread adoption and technical relevance in 2025. The study further examines common jailbreaking techniques and the specific vulnerabilities exploited in these models. Finally, we evaluate a range of mitigation strategies such as user education, advanced authentication mechanisms, and regulatory policy measures and discuss emerging trends in GenAI facilitated phishing, outlining future research directions to strengthen cybersecurity defenses in the age of artificial intelligence.

Related papers

Trojans in Artificial Intelligence (TrojAI) Final Report [52.6138928911574]
TrojAI was launched to confront an emerging vulnerability in modern artificial intelligence: the threat of AI Trojans.<n>TrojAI helped to map out the complex nature of the threat and pioneered foundational detection methods.<n>Report concludes with lessons learned and recommendations for advancing AI security research.
arXiv Detail & Related papers (2026-02-06T19:52:14Z)
Techniques of Modern Attacks [51.56484100374058]
Advanced Persistent Threats (APTs) represent a complex method of attack aimed at specific targets.<n>I will investigate both the attack life cycle and cutting-edge detection and defense strategies proposed in recent academic research.<n>I aim to highlight the strengths and limitations of each approach and propose more adaptive APT mitigation strategies.
arXiv Detail & Related papers (2026-01-19T22:15:25Z)
Generative AI for Biosciences: Emerging Threats and Roadmap to Biosecurity [56.331312963880215]
generative artificial intelligence (GenAI) in the biosciences is transforming biotechnology, medicine, and synthetic biology.<n>This Perspective outlines the current state of GenAI in the biosciences and emerging threat vectors ranging from jailbreak attacks and privacy risks to the dual-use challenges posed by autonomous AI agents.<n>We advocate a multi-layered approach to GenAI safety, including rigorous data filtering, alignment with ethical principles during development, and real-time monitoring to block harmful requests.
arXiv Detail & Related papers (2025-10-13T00:24:41Z)
A Systematic Survey of Model Extraction Attacks and Defenses: State-of-the-Art and Perspectives [65.3369988566853]
Recent studies have demonstrated that adversaries can replicate a target model's functionality.<n>Model Extraction Attacks pose threats to intellectual property, privacy, and system security.<n>We propose a novel taxonomy that classifies MEAs according to attack mechanisms, defense approaches, and computing environments.
arXiv Detail & Related papers (2025-08-20T19:49:59Z)
Jailbreaking Generative AI: Empowering Novices to Conduct Phishing Attacks [0.40964539027092917]
This paper investigates the misuse of the latest AI model, ChatGPT-4o Mini, in facilitating social engineering attacks.<n>Our findings underscore the alarming ease with which even inexperienced users can execute sophisticated phishing campaigns.
arXiv Detail & Related papers (2025-03-03T10:51:10Z)
Computational Safety for Generative AI: A Signal Processing Perspective [65.268245109828]
computational safety is a mathematical framework that enables the quantitative assessment, formulation, and study of safety challenges in GenAI.<n>We show how sensitivity analysis and loss landscape analysis can be used to detect malicious prompts with jailbreak attempts.<n>We discuss key open research challenges, opportunities, and the essential role of signal processing in computational AI safety.
arXiv Detail & Related papers (2025-02-18T02:26:50Z)
SoK: Watermarking for AI-Generated Content [112.9218881276487]
Watermarking schemes embed hidden signals within AI-generated content to enable reliable detection.<n>Watermarks can play a crucial role in enhancing AI safety and trustworthiness by combating misinformation and deception.<n>This work aims to guide researchers in advancing watermarking methods and applications, and support policymakers in addressing the broader implications of GenAI.
arXiv Detail & Related papers (2024-11-27T16:22:33Z)
Review of Generative AI Methods in Cybersecurity [0.6990493129893112]
This paper provides a comprehensive overview of the current state-of-the-art deployments of Generative AI (GenAI) It covers assaults, jailbreaking, and applications of prompt injection and reverse psychology. It also provides the various applications of GenAI in cybercrimes, such as automated hacking, phishing emails, social engineering, reverse cryptography, creating attack payloads, and creating malware.
arXiv Detail & Related papers (2024-03-13T17:05:05Z)
Decoding the Threat Landscape : ChatGPT, FraudGPT, and WormGPT in Social Engineering Attacks [0.0]
Generative AI models have revolutionized the field of cyberattacks, empowering malicious actors to craft convincing and personalized phishing lures. These models, ChatGPT, FraudGPT, and WormGPT, have augmented existing threats and ushered in new dimensions of risk. To counter these threats, we outline a range of strategies, including traditional security measures, AI-powered security solutions, and collaborative approaches in cybersecurity.
arXiv Detail & Related papers (2023-10-09T10:31:04Z)
BAGM: A Backdoor Attack for Manipulating Text-to-Image Generative Models [54.19289900203071]
The rise in popularity of text-to-image generative artificial intelligence has attracted widespread public interest. We demonstrate that this technology can be attacked to generate content that subtly manipulates its users. We propose a Backdoor Attack on text-to-image Generative Models (BAGM) Our attack is the first to target three popular text-to-image generative models across three stages of the generative process.
arXiv Detail & Related papers (2023-07-31T08:34:24Z)
From ChatGPT to ThreatGPT: Impact of Generative AI in Cybersecurity and Privacy [0.0]
This research paper highlights the limitations, challenges, potential risks, and opportunities of GenAI in the domain of cybersecurity and privacy. The paper investigates how cyber offenders can use the GenAI tools in developing cyber attacks. We will also discuss the social, legal, and ethical implications of ChatGPT.
arXiv Detail & Related papers (2023-07-03T00:36:57Z)
Impacts and Risk of Generative AI Technology on Cyber Defense [0.0]
We propose leveraging the Cyber Kill Chain (CKC) to understand the lifecycle of cyberattacks. This paper aims to provide a comprehensive analysis of the risk areas introduced by the offensive use of GenAI techniques. We also analyze the strategies employed by threat actors, highlighting the implications for cyber defense.
arXiv Detail & Related papers (2023-06-22T16:51:41Z)
When Authentication Is Not Enough: On the Security of Behavioral-Based Driver Authentication Systems [53.2306792009435]
We develop two lightweight driver authentication systems based on Random Forest and Recurrent Neural Network architectures. We are the first to propose attacks against these systems by developing two novel evasion attacks, SMARTCAN and GANCAN. Through our contributions, we aid practitioners in safely adopting these systems, help reduce car thefts, and enhance driver security.
arXiv Detail & Related papers (2023-06-09T14:33:26Z)
Towards Automated Classification of Attackers' TTPs by combining NLP with ML Techniques [77.34726150561087]
We evaluate and compare different Natural Language Processing (NLP) and machine learning techniques used for security information extraction in research. Based on our investigations we propose a data processing pipeline that automatically classifies unstructured text according to attackers' tactics and techniques.
arXiv Detail & Related papers (2022-07-18T09:59:21Z)
Adversarial Machine Learning Attacks and Defense Methods in the Cyber Security Domain [58.30296637276011]
This paper summarizes the latest research on adversarial attacks against security solutions based on machine learning techniques. It is the first to discuss the unique challenges of implementing end-to-end adversarial attacks in the cyber security domain.
arXiv Detail & Related papers (2020-07-05T18:22:40Z)
Adversarial vs behavioural-based defensive AI with joint, continual and active learning: automated evaluation of robustness to deception, poisoning and concept drift [62.997667081978825]
Recent advancements in Artificial Intelligence (AI) have brought new capabilities to behavioural analysis (UEBA) for cyber-security. In this paper, we present a solution to effectively mitigate this attack by improving the detection process and efficiently leveraging human expertise.
arXiv Detail & Related papers (2020-01-13T13:54:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.