Related papers: SMILES-Prompting: A Novel Approach to LLM Jailbreak Attacks in Chemical Synthesis

SMILES-Prompting: A Novel Approach to LLM Jailbreak Attacks in Chemical Synthesis

URL: http://arxiv.org/abs/2410.15641v1
Date: Mon, 21 Oct 2024 05:01:50 GMT
Title: SMILES-Prompting: A Novel Approach to LLM Jailbreak Attacks in Chemical Synthesis
Authors: Aidan Wong, He Cao, Zijing Liu, Yu Li,
Abstract summary: This paper explores the security vulnerabilities of large language models (LLMs) within the field of chemistry. We evaluate the effectiveness of several prompt injection attack methods, including red-teaming, explicit prompting, and implicit prompting. We introduce a novel attack technique named SMILES-prompting, which uses the simplified Molecular-Input Line-Entry System (SMILES) to reference chemical substances.
Score: 12.577741068644077
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The increasing integration of large language models (LLMs) across various fields has heightened concerns about their potential to propagate dangerous information. This paper specifically explores the security vulnerabilities of LLMs within the field of chemistry, particularly their capacity to provide instructions for synthesizing hazardous substances. We evaluate the effectiveness of several prompt injection attack methods, including red-teaming, explicit prompting, and implicit prompting. Additionally, we introduce a novel attack technique named SMILES-prompting, which uses the Simplified Molecular-Input Line-Entry System (SMILES) to reference chemical substances. Our findings reveal that SMILES-prompting can effectively bypass current safety mechanisms. These findings highlight the urgent need for enhanced domain-specific safeguards in LLMs to prevent misuse and improve their potential for positive social impact.

Related papers

ROSE: Toward Reality-Oriented Safety Evaluation of Large Language Models [60.28667314609623]
Large Language Models (LLMs) are increasingly deployed as black-box components in real-world applications.<n>We propose Reality-Oriented Safety Evaluation (ROSE), a novel framework that uses multi-objective reinforcement learning to fine-tune an adversarial LLM.
arXiv Detail & Related papers (2025-06-17T10:55:17Z)
A Systematic Review of Poisoning Attacks Against Large Language Models [4.390276781480338]
We propose a comprehensive poisoning threat model applicable to categorize a wide range of LLM poisoning attacks.<n>The poisoning threat model includes four poisoning attack specifications that define the logistics and manipulation strategies of an attack as well as six poisoning metrics used to measure key characteristics of an attack.
arXiv Detail & Related papers (2025-06-06T20:32:43Z)
ChemSafetyBench: Benchmarking LLM Safety on Chemistry Domain [28.205744043861756]
ChemSafetyBench is a benchmark designed to evaluate the accuracy and safety of large language models (LLMs) ChemSafetyBench encompasses three key tasks: querying chemical properties, assessing the legality of chemical uses, and describing synthesis methods. Our dataset has more than 30K samples across various chemical materials.
arXiv Detail & Related papers (2024-11-23T12:50:33Z)
Attention Tracker: Detecting Prompt Injection Attacks in LLMs [62.247841717696765]
Large Language Models (LLMs) have revolutionized various domains but remain vulnerable to prompt injection attacks. We introduce the concept of the distraction effect, where specific attention heads shift focus from the original instruction to the injected instruction. We propose Attention Tracker, a training-free detection method that tracks attention patterns on instruction to detect prompt injection attacks.
arXiv Detail & Related papers (2024-11-01T04:05:59Z)
LLMs know their vulnerabilities: Uncover Safety Gaps through Natural Distribution Shifts [88.96201324719205]
Safety concerns in large language models (LLMs) have gained significant attention due to their exposure to potentially harmful data during pre-training.<n>We identify a new safety vulnerability in LLMs, where seemingly benign prompts, semantically related to harmful content, can bypass safety mechanisms.<n>We introduce a novel attack method, textitActorBreaker, which identifies actors related to toxic prompts within pre-training distribution.
arXiv Detail & Related papers (2024-10-14T16:41:49Z)
Securing Large Language Models: Addressing Bias, Misinformation, and Prompt Attacks [12.893445918647842]
Large Language Models (LLMs) demonstrate impressive capabilities across various fields, yet their increasing use raises critical security concerns. This article reviews recent literature addressing key issues in LLM security, with a focus on accuracy, bias, content detection, and vulnerability to attacks.
arXiv Detail & Related papers (2024-09-12T14:42:08Z)
Counterfactual Explainable Incremental Prompt Attack Analysis on Large Language Models [32.03992137755351]
This study sheds light on the imperative need to bolster safety and privacy measures in large language models (LLMs) We propose Counterfactual Explainable Incremental Prompt Attack (CEIPA), a novel technique where we guide prompts in a specific manner to quantitatively measure attack effectiveness.
arXiv Detail & Related papers (2024-07-12T14:26:14Z)
ASETF: A Novel Method for Jailbreak Attack on LLMs through Translate Suffix Embeddings [58.82536530615557]
We propose an Adversarial Suffix Embedding Translation Framework (ASETF) to transform continuous adversarial suffix embeddings into coherent and understandable text. Our method significantly reduces the computation time of adversarial suffixes and achieves a much better attack success rate to existing techniques.
arXiv Detail & Related papers (2024-02-25T06:46:27Z)
The Wolf Within: Covert Injection of Malice into MLLM Societies via an MLLM Operative [55.08395463562242]
Multimodal Large Language Models (MLLMs) are constantly defining the new boundary of Artificial General Intelligence (AGI) Our paper explores a novel vulnerability in MLLM societies - the indirect propagation of malicious content.
arXiv Detail & Related papers (2024-02-20T23:08:21Z)
Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for Science [65.77763092833348]
Intelligent agents powered by large language models (LLMs) have demonstrated substantial promise in autonomously conducting experiments and facilitating scientific discoveries across various disciplines. While their capabilities are promising, these agents also introduce novel vulnerabilities that demand careful consideration for safety. This paper conducts a thorough examination of vulnerabilities in LLM-based agents within scientific domains, shedding light on potential risks associated with their misuse and emphasizing the need for safety measures.
arXiv Detail & Related papers (2024-02-06T18:54:07Z)
Signed-Prompt: A New Approach to Prevent Prompt Injection Attacks Against LLM-Integrated Applications [0.0]
This paper introduces the 'Signed-Prompt' method as a novel solution for prompt injection attacks. The study involves signing sensitive instructions within command segments by authorized users, enabling the LLM to discern trusted instruction sources. Experiments demonstrate the effectiveness of the Signed-Prompt method, showing substantial resistance to various types of prompt injection attacks.
arXiv Detail & Related papers (2024-01-15T11:44:18Z)
Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models [79.0183835295533]
We introduce the first benchmark for indirect prompt injection attacks, named BIPIA, to assess the risk of such vulnerabilities. Our analysis identifies two key factors contributing to their success: LLMs' inability to distinguish between informational context and actionable instructions, and their lack of awareness in avoiding the execution of instructions within external content. We propose two novel defense mechanisms-boundary awareness and explicit reminder-to address these vulnerabilities in both black-box and white-box settings.
arXiv Detail & Related papers (2023-12-21T01:08:39Z)
Hijacking Large Language Models via Adversarial In-Context Learning [10.416972293173993]
In-context learning (ICL) has emerged as a powerful paradigm leveraging LLMs for specific downstream tasks by utilizing labeled examples as demonstrations (demos) in the preconditioned prompts.<n>Existing attacks are either easy to detect, require a trigger in user input, or lack specificity towards ICL.<n>This work introduces a novel transferable prompt injection attack against ICL, aiming to hijack LLMs to generate the target output or elicit harmful responses.
arXiv Detail & Related papers (2023-11-16T15:01:48Z)
Evaluating the Instruction-Following Robustness of Large Language Models to Prompt Injection [70.28425745910711]
Large Language Models (LLMs) have demonstrated exceptional proficiency in instruction-following. This capability brings with it the risk of prompt injection attacks. We evaluate the robustness of instruction-following LLMs against such attacks.
arXiv Detail & Related papers (2023-08-17T06:21:50Z)
Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection [64.67495502772866]
Large Language Models (LLMs) are increasingly being integrated into various applications. We show how attackers can override original instructions and employed controls using Prompt Injection attacks. We derive a comprehensive taxonomy from a computer security perspective to systematically investigate impacts and vulnerabilities.
arXiv Detail & Related papers (2023-02-23T17:14:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.