Exploiting Jailbreaking Vulnerabilities in Generative AI to Bypass Ethical Safeguards for Facilitating Phishing Attacks
- URL: http://arxiv.org/abs/2507.12185v1
- Date: Wed, 16 Jul 2025 12:32:46 GMT
- Title: Exploiting Jailbreaking Vulnerabilities in Generative AI to Bypass Ethical Safeguards for Facilitating Phishing Attacks
- Authors: Rina Mishra, Gaurav Varshney,
- Abstract summary: This study investigates how GenAI powered services can be exploited via jailbreaking techniques to bypass ethical safeguards.<n>We used ChatGPT 4o Mini selected for its accessibility and status as the latest publicly available model as a representative GenAI system.<n>Our findings reveal that the model could successfully guide novice users in executing phishing attacks across various vectors, including web, email, SMS (smishing), and voice (vishing)
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The advent of advanced Generative AI (GenAI) models such as DeepSeek and ChatGPT has significantly reshaped the cybersecurity landscape, introducing both promising opportunities and critical risks. This study investigates how GenAI powered chatbot services can be exploited via jailbreaking techniques to bypass ethical safeguards, enabling the generation of phishing content, recommendation of hacking tools, and orchestration of phishing campaigns. In ethically controlled experiments, we used ChatGPT 4o Mini selected for its accessibility and status as the latest publicly available model at the time of experimentation, as a representative GenAI system. Our findings reveal that the model could successfully guide novice users in executing phishing attacks across various vectors, including web, email, SMS (smishing), and voice (vishing). Unlike automated phishing campaigns that typically follow detectable patterns, these human-guided, AI assisted attacks are capable of evading traditional anti phishing mechanisms, thereby posing a growing security threat. We focused on DeepSeek and ChatGPT due to their widespread adoption and technical relevance in 2025. The study further examines common jailbreaking techniques and the specific vulnerabilities exploited in these models. Finally, we evaluate a range of mitigation strategies such as user education, advanced authentication mechanisms, and regulatory policy measures and discuss emerging trends in GenAI facilitated phishing, outlining future research directions to strengthen cybersecurity defenses in the age of artificial intelligence.
Related papers
- Jailbreaking Generative AI: Empowering Novices to Conduct Phishing Attacks [0.40964539027092917]
This paper investigates the misuse of the latest AI model, ChatGPT-4o Mini, in facilitating social engineering attacks.<n>Our findings underscore the alarming ease with which even inexperienced users can execute sophisticated phishing campaigns.
arXiv Detail & Related papers (2025-03-03T10:51:10Z) - Computational Safety for Generative AI: A Signal Processing Perspective [65.268245109828]
computational safety is a mathematical framework that enables the quantitative assessment, formulation, and study of safety challenges in GenAI.<n>We show how sensitivity analysis and loss landscape analysis can be used to detect malicious prompts with jailbreak attempts.<n>We discuss key open research challenges, opportunities, and the essential role of signal processing in computational AI safety.
arXiv Detail & Related papers (2025-02-18T02:26:50Z) - SoK: Watermarking for AI-Generated Content [112.9218881276487]
Watermarking schemes embed hidden signals within AI-generated content to enable reliable detection.<n>Watermarks can play a crucial role in enhancing AI safety and trustworthiness by combating misinformation and deception.<n>This work aims to guide researchers in advancing watermarking methods and applications, and support policymakers in addressing the broader implications of GenAI.
arXiv Detail & Related papers (2024-11-27T16:22:33Z) - Review of Generative AI Methods in Cybersecurity [0.6990493129893112]
This paper provides a comprehensive overview of the current state-of-the-art deployments of Generative AI (GenAI)
It covers assaults, jailbreaking, and applications of prompt injection and reverse psychology.
It also provides the various applications of GenAI in cybercrimes, such as automated hacking, phishing emails, social engineering, reverse cryptography, creating attack payloads, and creating malware.
arXiv Detail & Related papers (2024-03-13T17:05:05Z) - Decoding the Threat Landscape : ChatGPT, FraudGPT, and WormGPT in Social Engineering Attacks [0.0]
Generative AI models have revolutionized the field of cyberattacks, empowering malicious actors to craft convincing and personalized phishing lures.
These models, ChatGPT, FraudGPT, and WormGPT, have augmented existing threats and ushered in new dimensions of risk.
To counter these threats, we outline a range of strategies, including traditional security measures, AI-powered security solutions, and collaborative approaches in cybersecurity.
arXiv Detail & Related papers (2023-10-09T10:31:04Z) - BAGM: A Backdoor Attack for Manipulating Text-to-Image Generative Models [54.19289900203071]
The rise in popularity of text-to-image generative artificial intelligence has attracted widespread public interest.
We demonstrate that this technology can be attacked to generate content that subtly manipulates its users.
We propose a Backdoor Attack on text-to-image Generative Models (BAGM)
Our attack is the first to target three popular text-to-image generative models across three stages of the generative process.
arXiv Detail & Related papers (2023-07-31T08:34:24Z) - From ChatGPT to ThreatGPT: Impact of Generative AI in Cybersecurity and
Privacy [0.0]
This research paper highlights the limitations, challenges, potential risks, and opportunities of GenAI in the domain of cybersecurity and privacy.
The paper investigates how cyber offenders can use the GenAI tools in developing cyber attacks.
We will also discuss the social, legal, and ethical implications of ChatGPT.
arXiv Detail & Related papers (2023-07-03T00:36:57Z) - Impacts and Risk of Generative AI Technology on Cyber Defense [0.0]
We propose leveraging the Cyber Kill Chain (CKC) to understand the lifecycle of cyberattacks.
This paper aims to provide a comprehensive analysis of the risk areas introduced by the offensive use of GenAI techniques.
We also analyze the strategies employed by threat actors, highlighting the implications for cyber defense.
arXiv Detail & Related papers (2023-06-22T16:51:41Z) - When Authentication Is Not Enough: On the Security of Behavioral-Based Driver Authentication Systems [53.2306792009435]
We develop two lightweight driver authentication systems based on Random Forest and Recurrent Neural Network architectures.
We are the first to propose attacks against these systems by developing two novel evasion attacks, SMARTCAN and GANCAN.
Through our contributions, we aid practitioners in safely adopting these systems, help reduce car thefts, and enhance driver security.
arXiv Detail & Related papers (2023-06-09T14:33:26Z) - Towards Automated Classification of Attackers' TTPs by combining NLP
with ML Techniques [77.34726150561087]
We evaluate and compare different Natural Language Processing (NLP) and machine learning techniques used for security information extraction in research.
Based on our investigations we propose a data processing pipeline that automatically classifies unstructured text according to attackers' tactics and techniques.
arXiv Detail & Related papers (2022-07-18T09:59:21Z) - Adversarial Machine Learning Attacks and Defense Methods in the Cyber
Security Domain [58.30296637276011]
This paper summarizes the latest research on adversarial attacks against security solutions based on machine learning techniques.
It is the first to discuss the unique challenges of implementing end-to-end adversarial attacks in the cyber security domain.
arXiv Detail & Related papers (2020-07-05T18:22:40Z) - Adversarial vs behavioural-based defensive AI with joint, continual and
active learning: automated evaluation of robustness to deception, poisoning
and concept drift [62.997667081978825]
Recent advancements in Artificial Intelligence (AI) have brought new capabilities to behavioural analysis (UEBA) for cyber-security.
In this paper, we present a solution to effectively mitigate this attack by improving the detection process and efficiently leveraging human expertise.
arXiv Detail & Related papers (2020-01-13T13:54:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.