Related papers: SearchAttack: Red-Teaming LLMs against Real-World Threats via Framing Unsafe Web Information-Seeking Tasks

SearchAttack: Red-Teaming LLMs against Real-World Threats via Framing Unsafe Web Information-Seeking Tasks

URL: http://arxiv.org/abs/2601.04093v1
Date: Wed, 07 Jan 2026 16:59:34 GMT
Title: SearchAttack: Red-Teaming LLMs against Real-World Threats via Framing Unsafe Web Information-Seeking Tasks
Authors: Yu Yan, Sheng Sun, Mingfeng Li, Zheming Yang, Chiwei Zhu, Fei Ma, Benfeng Xu, Min Liu,
Abstract summary: Motivated by this dilemma, we identify web search as a critical attack surface and propose textbftextitSearchAttack for red-teaming.<n>SearchAttack outsources the harmful semantics to web search, retaining only the query's skeleton and fragmented clues.
Score: 19.28321072381512
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Recently, people have suffered and become increasingly aware of the unreliability gap in LLMs for open and knowledge-intensive tasks, and thus turn to search-augmented LLMs to mitigate this issue. However, when the search engine is triggered for harmful tasks, the outcome is no longer under the LLM's control. Once the returned content directly contains targeted, ready-to-use harmful takeaways, the LLM's safeguards cannot withdraw that exposure. Motivated by this dilemma, we identify web search as a critical attack surface and propose \textbf{\textit{SearchAttack}} for red-teaming. SearchAttack outsources the harmful semantics to web search, retaining only the query's skeleton and fragmented clues, and further steers LLMs to reconstruct the retrieved content via structural rubrics to achieve malicious goals. Extensive experiments are conducted to red-team the search-augmented LLMs for responsible vulnerability assessment. Empirically, SearchAttack demonstrates strong effectiveness in attacking these systems.

Related papers

CREST-Search: Comprehensive Red-teaming for Evaluating Safety Threats in Large Language Models Powered by Web Search [28.45573025341277]
Large Language Models (LLMs) excel at tasks such as dialogue, summarization, and question answering.<n>To overcome this, web search has been integrated into LLMs, allowing real-time access to online content.<n>This connection magnifies safety risks, as adversarial prompts combined with untrusted sources can cause severe vulnerabilities.<n>We present CREST-Search, a framework that systematically exposes risks in such systems.
arXiv Detail & Related papers (2025-10-09T09:44:14Z)
SafeSearch: Automated Red-Teaming for the Safety of LLM-Based Search Agents [63.70653857721785]
We conduct two in-the-wild experiments to demonstrate the prevalence of low-quality search results and their potential to misguide agent behaviors.<n>To counter this threat, we introduce an automated red-teaming framework that is systematic, scalable, and cost-efficient.
arXiv Detail & Related papers (2025-09-28T07:05:17Z)
Large Language Models powered Malicious Traffic Detection: Architecture, Opportunities and Case Study [12.381768120279771]
Large Language Models (LLMs) are trained on a vast corpus of text.<n>We focus on unleashing the full potential of LLMs in malicious traffic detection.<n>We present our design on LLM-powered DDoS detection as a case study.
arXiv Detail & Related papers (2025-03-24T09:40:46Z)
How to Protect Yourself from 5G Radiation? Investigating LLM Responses to Implicit Misinformation [35.365004091470944]
Large Language Models (LLMs) are widely deployed in diverse scenarios.<n>The extent to which they could tacitly spread misinformation emerges as a critical safety concern.<n>We curated EchoMist, the first benchmark for implicit misinformation.
arXiv Detail & Related papers (2025-03-12T17:59:18Z)
Look Before You Leap: Enhancing Attention and Vigilance Regarding Harmful Content with GuidelineLLM [53.79753074854936]
Large language models (LLMs) are increasingly vulnerable to emerging jailbreak attacks.<n>This vulnerability poses significant risks to real-world applications.<n>We propose a novel defensive paradigm called GuidelineLLM.
arXiv Detail & Related papers (2024-12-10T12:42:33Z)
Can LLMs be Fooled? Investigating Vulnerabilities in LLMs [4.927763944523323]
The advent of Large Language Models (LLMs) has garnered significant popularity and wielded immense power across various domains within Natural Language Processing (NLP) This paper will synthesize the findings from each vulnerability section and propose new directions of research and development. By understanding the focal points of current vulnerabilities, we can better anticipate and mitigate future risks.
arXiv Detail & Related papers (2024-07-30T04:08:00Z)
Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models [79.0183835295533]
We introduce the first benchmark for indirect prompt injection attacks, named BIPIA, to assess the risk of such vulnerabilities.<n>Our analysis identifies two key factors contributing to their success: LLMs' inability to distinguish between informational context and actionable instructions, and their lack of awareness in avoiding the execution of instructions within external content.<n>We propose two novel defense mechanisms-boundary awareness and explicit reminder-to address these vulnerabilities in both black-box and white-box settings.
arXiv Detail & Related papers (2023-12-21T01:08:39Z)
MART: Improving LLM Safety with Multi-round Automatic Red-Teaming [72.2127916030909]
We propose a Multi-round Automatic Red-Teaming (MART) method, which incorporates both automatic adversarial prompt writing and safe response generation. On adversarial prompt benchmarks, the violation rate of an LLM with limited safety alignment reduces up to 84.7% after 4 rounds of MART. Notably, model helpfulness on non-adversarial prompts remains stable throughout iterations, indicating the target LLM maintains strong performance on instruction following.
arXiv Detail & Related papers (2023-11-13T19:13:29Z)
LLM Censorship: A Machine Learning Challenge or a Computer Security Problem? [52.71988102039535]
We show that semantic censorship can be perceived as an undecidable problem. We argue that the challenges extend beyond semantic censorship, as knowledgeable attackers can reconstruct impermissible outputs.
arXiv Detail & Related papers (2023-07-20T09:25:02Z)
Red Teaming Language Model Detectors with Language Models [114.36392560711022]
Large language models (LLMs) present significant safety and ethical risks if exploited by malicious users. Recent works have proposed algorithms to detect LLM-generated text and protect LLMs. We study two types of attack strategies: 1) replacing certain words in an LLM's output with their synonyms given the context; 2) automatically searching for an instructional prompt to alter the writing style of the generation.
arXiv Detail & Related papers (2023-05-31T10:08:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.