Related papers: StealthRank: LLM Ranking Manipulation via Stealthy Prompt Optimization

Related papers

The Vulnerability of LLM Rankers to Prompt Injection Attacks [40.03039307576983]
Large Language Models (LLMs) have emerged as powerful re-rankers.<n>Recent research has showed that simple prompt injections embedded within a candidate document can significantly alter an LLM's ranking decisions.
arXiv Detail & Related papers (2026-02-18T06:19:08Z)
Are LLMs Reliable Rankers? Rank Manipulation via Two-Stage Token Optimization [7.7899746437628385]
We present Rank Anything First (RAF), a two-stage token optimization method.<n>RAF crafts concise textual perturbations to consistently promote a target item in large language models.<n>RAF generates ranking-promoting prompts token-by-token, guided by dual objectives: maximizing ranking effectiveness and preserving linguistic naturalness.
arXiv Detail & Related papers (2025-10-08T07:40:40Z)
The Ranking Blind Spot: Decision Hijacking in LLM-based Text Ranking [17.328293277532]
Large Language Models (LLMs) have demonstrated strong performance in information retrieval tasks like passage ranking.<n>This research examines how instruction-following capabilities in LLMs interact with multi-document comparison tasks.<n>We analyze how this ranking blind spot affects LLM evaluation systems through two approaches.
arXiv Detail & Related papers (2025-09-23T02:56:38Z)
Reinforcement Speculative Decoding for Fast Ranking [9.584558586988953]
Large Language Models (LLMs) have been widely adopted in ranking systems such as information retrieval (IR) systems and recommender systems (RSs)<n>We propose a Reinforcementive Decoding method for fast ranking inference of LLMs.
arXiv Detail & Related papers (2025-05-23T02:25:26Z)
When Safety Detectors Aren't Enough: A Stealthy and Effective Jailbreak Attack on LLMs via Steganographic Techniques [5.2431999629987]
Jailbreak attacks pose a serious threat to large language models (LLMs)<n>This paper presents a systematic survey of jailbreak methods from the novel perspective of stealth.<n>We propose StegoAttack, a stealthy jailbreak attack that uses steganography to hide the harmful query within benign, semantically coherent text.
arXiv Detail & Related papers (2025-05-22T15:07:34Z)
Can LLMs Classify CVEs? Investigating LLMs Capabilities in Computing CVSS Vectors [15.43868945929965]
We evaluate the effectiveness of Large Language Models (LLMs) in generating CVSS scores for newly reported vulnerabilities. Our results show that while LLMs demonstrate potential in automating CVSS evaluation, embedding-based methods outperform them in scoring more subjective components.
arXiv Detail & Related papers (2025-04-14T21:10:57Z)
CheatAgent: Attacking LLM-Empowered Recommender Systems via LLM Agent [32.958798200220286]
Large Language Model (LLM)-empowered recommender systems (RecSys) have brought significant advances in personalized user experience. We propose a novel attack framework called CheatAgent by harnessing the human-like capabilities of LLMs. Our method first identifies the insertion position for maximum impact with minimal input modification.
arXiv Detail & Related papers (2025-04-13T05:31:37Z)
The TIP of the Iceberg: Revealing a Hidden Class of Task-in-Prompt Adversarial Attacks on LLMs [1.9424018922013224]
We present a novel class of jailbreak adversarial attacks on LLMs.<n>Our approach embeds sequence-to-sequence tasks into the model's prompt to indirectly generate prohibited inputs.<n>We demonstrate that our techniques successfully circumvent safeguards in six state-of-the-art language models.
arXiv Detail & Related papers (2025-01-27T12:48:47Z)
Fun-tuning: Characterizing the Vulnerability of Proprietary LLMs to Optimization-based Prompt Injection Attacks via the Fine-Tuning Interface [3.908034401768844]
We describe how an attacker can leverage the loss-like information returned from the remote fine-tuning interface to guide the search for adversarial prompts.<n>We demonstrate attack success rates between 65% and 82% on Google's Gemini family of LLMs.
arXiv Detail & Related papers (2025-01-16T19:01:25Z)
ChainRank-DPO: Chain Rank Direct Preference Optimization for LLM Rankers [22.51924253176532]
Large language models (LLMs) have demonstrated remarkable effectiveness in text reranking through works like RankGPT.<n>Supervised fine-tuning for ranking often diminishes these models' general-purpose capabilities.<n>We introduce a novel approach integrating Chain-of-Thought prompting with an SFT-DPO pipeline to preserve these capabilities while improving ranking performance.
arXiv Detail & Related papers (2024-12-18T23:24:15Z)
LeakAgent: RL-based Red-teaming Agent for LLM Privacy Leakage [78.33839735526769]
LeakAgent is a novel black-box red-teaming framework for privacy leakage.<n>Our framework trains an open-source LLM through reinforcement learning as the attack agent to generate adversarial prompts.<n>We show that LeakAgent significantly outperforms existing rule-based approaches in training data extraction and automated methods in system prompt leakage.
arXiv Detail & Related papers (2024-12-07T20:09:01Z)
Evaluating and Improving the Robustness of Security Attack Detectors Generated by LLMs [6.936401700600395]
Large Language Models (LLMs) are increasingly used in software development to generate functions, such as attack detectors, that implement security requirements.<n>This is most likely due to the LLM lacking knowledge about some existing attacks and to the generated code being not evaluated in real usage scenarios.<n>We propose a novel approach integrating Retrieval Augmented Generation (RAG) and Self-Ranking into the LLM pipeline.
arXiv Detail & Related papers (2024-11-27T10:48:37Z)
Palisade -- Prompt Injection Detection Framework [0.9620910657090188]
Large Language Models are vulnerable to malicious prompt injection attacks. This paper proposes a novel NLP based approach for prompt injection detection. It emphasizes accuracy and optimization through a layered input screening process.
arXiv Detail & Related papers (2024-10-28T15:47:03Z)
Understanding Ranking LLMs: A Mechanistic Analysis for Information Retrieval [20.353393773305672]
We employ a probing-based analysis to examine neuron activations in ranking LLMs.<n>Our study spans a broad range of feature categories, including lexical signals, document structure, query-document interactions, and complex semantic representations.<n>Our findings offer crucial insights for developing more transparent and reliable retrieval systems.
arXiv Detail & Related papers (2024-10-24T08:20:10Z)
Efficient Adversarial Training in LLMs with Continuous Attacks [99.5882845458567]
Large language models (LLMs) are vulnerable to adversarial attacks that can bypass their safety guardrails. We propose a fast adversarial training algorithm (C-AdvUL) composed of two losses. C-AdvIPO is an adversarial variant of IPO that does not require utility data for adversarially robust alignment.
arXiv Detail & Related papers (2024-05-24T14:20:09Z)
Improve Temporal Awareness of LLMs for Sequential Recommendation [61.723928508200196]
Large language models (LLMs) have demonstrated impressive zero-shot abilities in solving a wide range of general-purpose tasks. LLMs fall short in recognizing and utilizing temporal information, rendering poor performance in tasks that require an understanding of sequential data. We propose three prompting strategies to exploit temporal information within historical interactions for LLM-based sequential recommendation.
arXiv Detail & Related papers (2024-05-05T00:21:26Z)
ASETF: A Novel Method for Jailbreak Attack on LLMs through Translate Suffix Embeddings [58.82536530615557]
We propose an Adversarial Suffix Embedding Translation Framework (ASETF) to transform continuous adversarial suffix embeddings into coherent and understandable text. Our method significantly reduces the computation time of adversarial suffixes and achieves a much better attack success rate to existing techniques.
arXiv Detail & Related papers (2024-02-25T06:46:27Z)
RLHFPoison: Reward Poisoning Attack for Reinforcement Learning with Human Feedback in Large Language Models [62.72318564072706]
Reinforcement Learning with Human Feedback (RLHF) is a methodology designed to align Large Language Models (LLMs) with human preferences. Despite its advantages, RLHF relies on human annotators to rank the text. We propose RankPoison, a poisoning attack method on candidates' selection of preference rank flipping to reach certain malicious behaviors.
arXiv Detail & Related papers (2023-11-16T07:48:45Z)
MART: Improving LLM Safety with Multi-round Automatic Red-Teaming [72.2127916030909]
We propose a Multi-round Automatic Red-Teaming (MART) method, which incorporates both automatic adversarial prompt writing and safe response generation. On adversarial prompt benchmarks, the violation rate of an LLM with limited safety alignment reduces up to 84.7% after 4 rounds of MART. Notably, model helpfulness on non-adversarial prompts remains stable throughout iterations, indicating the target LLM maintains strong performance on instruction following.
arXiv Detail & Related papers (2023-11-13T19:13:29Z)
SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks [99.23352758320945]
We propose SmoothLLM, the first algorithm designed to mitigate jailbreaking attacks on large language models (LLMs) Based on our finding that adversarially-generated prompts are brittle to character-level changes, our defense first randomly perturbs multiple copies of a given input prompt, and then aggregates the corresponding predictions to detect adversarial inputs.
arXiv Detail & Related papers (2023-10-05T17:01:53Z)
Red Teaming Language Model Detectors with Language Models [114.36392560711022]
Large language models (LLMs) present significant safety and ethical risks if exploited by malicious users. Recent works have proposed algorithms to detect LLM-generated text and protect LLMs. We study two types of attack strategies: 1) replacing certain words in an LLM's output with their synonyms given the context; 2) automatically searching for an instructional prompt to alter the writing style of the generation.
arXiv Detail & Related papers (2023-05-31T10:08:37Z)
Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents [53.78782375511531]
Large Language Models (LLMs) have demonstrated remarkable zero-shot generalization across various language-related tasks.<n>This paper investigates generative LLMs for relevance ranking in Information Retrieval (IR)<n>To address concerns about data contamination of LLMs, we collect a new test set called NovelEval.<n>To improve efficiency in real-world applications, we delve into the potential for distilling the ranking capabilities of ChatGPT into small specialized models.
arXiv Detail & Related papers (2023-04-19T10:16:03Z)
Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection [64.67495502772866]
Large Language Models (LLMs) are increasingly being integrated into various applications. We show how attackers can override original instructions and employed controls using Prompt Injection attacks. We derive a comprehensive taxonomy from a computer security perspective to systematically investigate impacts and vulnerabilities.
arXiv Detail & Related papers (2023-02-23T17:14:38Z)
A Tale of HodgeRank and Spectral Method: Target Attack Against Rank Aggregation Is the Fixed Point of Adversarial Game [153.74942025516853]
The intrinsic vulnerability of the rank aggregation methods is not well studied in the literature. In this paper, we focus on the purposeful adversary who desires to designate the aggregated results by modifying the pairwise data. The effectiveness of the suggested target attack strategies is demonstrated by a series of toy simulations and several real-world data experiments.
arXiv Detail & Related papers (2022-09-13T05:59:02Z)
Practical Relative Order Attack in Deep Ranking [99.332629807873]
We formulate a new adversarial attack against deep ranking systems, i.e., the Order Attack. The Order Attack covertly alters the relative order among a selected set of candidates according to an attacker-specified permutation. It is successfully implemented on a major e-commerce platform.
arXiv Detail & Related papers (2021-03-09T06:41:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.