Related papers: Semantic-guided Prompt Organization for Universal Goal Hijacking against LLMs

Semantic-guided Prompt Organization for Universal Goal Hijacking against LLMs

URL: http://arxiv.org/abs/2405.14189v1
Date: Thu, 23 May 2024 05:31:41 GMT
Title: Semantic-guided Prompt Organization for Universal Goal Hijacking against LLMs
Authors: Yihao Huang, Chong Wang, Xiaojun Jia, Qing Guo, Felix Juefei-Xu, Jian Zhang, Geguang Pu, Yang Liu,
Abstract summary: We propose a universal goal hijacking method called POUGH that incorporates semantic-guided prompt processing strategies. The method starts with a sampling strategy to select representative prompts from a candidate pool, followed by a ranking strategy that prioritizes the prompts. Experiments conducted on four popular Large Language Models and ten types of target responses verified the effectiveness of our method.
Score: 30.56428628397079
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the rising popularity of Large Language Models (LLMs), assessing their trustworthiness through security tasks has gained critical importance. Regarding the new task of universal goal hijacking, previous efforts have concentrated solely on optimization algorithms, overlooking the crucial role of the prompt. To fill this gap, we propose a universal goal hijacking method called POUGH that incorporates semantic-guided prompt processing strategies. Specifically, the method starts with a sampling strategy to select representative prompts from a candidate pool, followed by a ranking strategy that prioritizes the prompts. Once the prompts are organized sequentially, the method employs an iterative optimization algorithm to generate the universal fixed suffix for the prompts. Experiments conducted on four popular LLMs and ten types of target responses verified the effectiveness of our method.

Related papers

HPSS: Heuristic Prompting Strategy Search for LLM Evaluators [81.09765876000208]
We propose a novel automatic prompting strategy optimization method called Heuristic Prompting Strategy Search (HPSS) Inspired by the genetic algorithm, HPSS conducts an iterative search to find well-behaved prompting strategies for evaluators. Extensive experiments across four evaluation tasks demonstrate the effectiveness of HPSS.
arXiv Detail & Related papers (2025-02-18T16:46:47Z)
Meta-Prompt Optimization for LLM-Based Sequential Decision Making [24.050701239196876]
Large language models (LLMs) have been employed as agents to solve sequential decision-making tasks. We propose our EXPonential-weight algorithm for prompt Optimization (EXPO) to automatically optimize the task description and meta-instruction in the meta-prompt. We also extend EXPO to additionally optimize the exemplars in the meta-prompt to further enhance the performance.
arXiv Detail & Related papers (2025-02-02T09:22:39Z)
Refining Answer Distributions for Improved Large Language Model Reasoning [24.67507932821155]
We present Refined Answer Distributions, a novel and principled algorithmic framework to enhance the reasoning capabilities of Large Language Models (LLMs) Our approach can be viewed as an iterative sampling strategy for forming a Monte Carlo approximation of an underlying distribution of answers, with the goal of identifying the mode -- the most likely answer.
arXiv Detail & Related papers (2024-12-17T19:45:53Z)
Self-Calibrated Listwise Reranking with Large Language Models [137.6557607279876]
Large language models (LLMs) have been employed in reranking tasks through a sequence-to-sequence approach. This reranking paradigm requires a sliding window strategy to iteratively handle larger candidate sets. We propose a novel self-calibrated listwise reranking method, which aims to leverage LLMs to produce global relevance scores for ranking.
arXiv Detail & Related papers (2024-11-07T10:31:31Z)
Pseudo-Conversation Injection for LLM Goal Hijacking [3.574664325523221]
In goal hijacking, an attacker typically appends a carefully crafted malicious suffix to the user's prompt. We introduce a novel goal hijacking attack method called Pseudo-Conversation Injection. We propose three Pseudo-Conversation construction strategies: Targeted Pseudo-Conversation, Universal Pseudo-Conversation, and Robust Pseudo-Conversation.
arXiv Detail & Related papers (2024-10-31T06:58:34Z)
QPO: Query-dependent Prompt Optimization via Multi-Loop Offline Reinforcement Learning [58.767866109043055]
We introduce Query-dependent Prompt Optimization (QPO), which iteratively fine-tune a small pretrained language model to generate optimal prompts tailored to the input queries. We derive insights from offline prompting demonstration data, which already exists in large quantities as a by-product of benchmarking diverse prompts on open-sourced tasks. Experiments on various LLM scales and diverse NLP and math tasks demonstrate the efficacy and cost-efficiency of our method in both zero-shot and few-shot scenarios.
arXiv Detail & Related papers (2024-08-20T03:06:48Z)
MAPO: Boosting Large Language Model Performance with Model-Adaptive Prompt Optimization [73.7779735046424]
We show that different prompts should be adapted to different Large Language Models (LLM) to enhance their capabilities across various downstream tasks in NLP. We then propose a model-adaptive prompt (MAPO) method that optimize the original prompts for each specific LLM in downstream tasks.
arXiv Detail & Related papers (2024-07-04T18:39:59Z)
Efficient Prompting Methods for Large Language Models: A Survey [50.171011917404485]
Prompting has become a mainstream paradigm for adapting large language models (LLMs) to specific natural language processing tasks. This approach brings the additional computational burden of model inference and human effort to guide and control the behavior of LLMs. We present the basic concepts of prompting, review the advances for efficient prompting, and highlight future research directions.
arXiv Detail & Related papers (2024-04-01T12:19:08Z)
Query-Dependent Prompt Evaluation and Optimization with Offline Inverse RL [62.824464372594576]
We aim to enhance arithmetic reasoning ability of Large Language Models (LLMs) through zero-shot prompt optimization. We identify a previously overlooked objective of query dependency in such optimization. We introduce Prompt-OIRL, which harnesses offline inverse reinforcement learning to draw insights from offline prompting demonstration data.
arXiv Detail & Related papers (2023-09-13T01:12:52Z)
Guiding Large Language Models via Directional Stimulus Prompting [114.84930073977672]
We introduce Directional Stimulus Prompting, a novel framework for guiding black-box large language models (LLMs) toward specific desired outputs. Instead of directly adjusting LLMs, our method employs a small tunable policy model to generate an auxiliary directional stimulus prompt for each input instance.
arXiv Detail & Related papers (2023-02-22T17:44:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.