TRAPDOC: Deceiving LLM Users by Injecting Imperceptible Phantom Tokens into Documents
- URL: http://arxiv.org/abs/2506.00089v1
- Date: Fri, 30 May 2025 07:16:53 GMT
- Title: TRAPDOC: Deceiving LLM Users by Injecting Imperceptible Phantom Tokens into Documents
- Authors: Hyundong Jin, Sicheol Sung, Shinwoo Park, SeungYeop Baik, Yo-Sub Han,
- Abstract summary: Over-reliance on large language models (LLMs) is emerging as a significant social issue.<n>We propose a method injecting imperceptible phantom tokens into documents, which causes LLMs to generate outputs that appear plausible to users but are in fact incorrect.<n>Based on this technique, we introduce TRAPDOC, a framework designed to deceive over-reliant LLM users.
- Score: 4.753535328327316
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The reasoning, writing, text-editing, and retrieval capabilities of proprietary large language models (LLMs) have advanced rapidly, providing users with an ever-expanding set of functionalities. However, this growing utility has also led to a serious societal concern: the over-reliance on LLMs. In particular, users increasingly delegate tasks such as homework, assignments, or the processing of sensitive documents to LLMs without meaningful engagement. This form of over-reliance and misuse is emerging as a significant social issue. In order to mitigate these issues, we propose a method injecting imperceptible phantom tokens into documents, which causes LLMs to generate outputs that appear plausible to users but are in fact incorrect. Based on this technique, we introduce TRAPDOC, a framework designed to deceive over-reliant LLM users. Through empirical evaluation, we demonstrate the effectiveness of our framework on proprietary LLMs, comparing its impact against several baselines. TRAPDOC serves as a strong foundation for promoting more responsible and thoughtful engagement with language models. Our code is available at https://github.com/jindong22/TrapDoc.
Related papers
- TracLLM: A Generic Framework for Attributing Long Context LLMs [34.802736332993994]
We develop TracLLM, the first generic context traceback framework tailored to long context LLMs.<n>Our framework can improve the effectiveness and efficiency of existing feature attribution methods.<n>Our evaluation results show TracLLM can effectively identify texts in a long context that lead to the output of an LLM.
arXiv Detail & Related papers (2025-06-04T17:48:16Z) - LightPROF: A Lightweight Reasoning Framework for Large Language Model on Knowledge Graph [57.382255728234064]
Large Language Models (LLMs) have impressive capabilities in text understanding and zero-shot reasoning.<n> Knowledge Graphs (KGs) provide rich and reliable contextual information for the reasoning process of LLMs.<n>We propose a novel Lightweight and efficient Prompt learning-ReasOning Framework for KGQA (LightPROF)
arXiv Detail & Related papers (2025-04-04T03:03:47Z) - An Empirical Study on Commit Message Generation using LLMs via In-Context Learning [26.39743339039473]
Commit messages concisely describe code changes in natural language.<n>We propose to borrow the weapon of large language models (LLMs) and in-context learning (ICL) to generate commit messages.
arXiv Detail & Related papers (2025-02-26T07:47:52Z) - From Yes-Men to Truth-Tellers: Addressing Sycophancy in Large Language Models with Pinpoint Tuning [91.79567270986901]
Large Language Models (LLMs) tend to prioritize adherence to user prompts over providing veracious responses.<n>Recent works propose to employ supervised fine-tuning (SFT) to mitigate the sycophancy issue.<n>We propose a novel supervised pinpoint tuning (SPT), where the region-of-interest modules are tuned for a given objective.
arXiv Detail & Related papers (2024-09-03T07:01:37Z) - Get my drift? Catching LLM Task Drift with Activation Deltas [55.75645403965326]
Task drift allows attackers to exfiltrate data or influence the LLM's output for other users.<n>We show that a simple linear classifier can detect drift with near-perfect ROC AUC on an out-of-distribution test set.<n>We observe that this approach generalizes surprisingly well to unseen task domains, such as prompt injections, jailbreaks, and malicious instructions.
arXiv Detail & Related papers (2024-06-02T16:53:21Z) - GoEX: Perspectives and Designs Towards a Runtime for Autonomous LLM Applications [46.85306320942487]
Large Language Models (LLMs) are evolving to actively engage with tools and performing actions on real-world applications and services.
Today, humans verify the correctness and appropriateness of the LLM-generated outputs before putting them into real-world execution.
This poses significant challenges as code comprehension is well known to be notoriously difficult.
In this paper, we study how humans can efficiently collaborate with, delegate to, and supervise autonomous LLMs in the future.
arXiv Detail & Related papers (2024-04-10T11:17:33Z) - LLatrieval: LLM-Verified Retrieval for Verifiable Generation [67.93134176912477]
Verifiable generation aims to let the large language model (LLM) generate text with supporting documents.
We propose LLatrieval (Large Language Model Verified Retrieval), where the LLM updates the retrieval result until it verifies that the retrieved documents can sufficiently support answering the question.
Experiments show that LLatrieval significantly outperforms extensive baselines and achieves state-of-the-art results.
arXiv Detail & Related papers (2023-11-14T01:38:02Z) - ReEval: Automatic Hallucination Evaluation for Retrieval-Augmented Large Language Models via Transferable Adversarial Attacks [91.55895047448249]
This paper presents ReEval, an LLM-based framework using prompt chaining to perturb the original evidence for generating new test cases.
We implement ReEval using ChatGPT and evaluate the resulting variants of two popular open-domain QA datasets.
Our generated data is human-readable and useful to trigger hallucination in large language models.
arXiv Detail & Related papers (2023-10-19T06:37:32Z) - Low-code LLM: Graphical User Interface over Large Language Models [115.08718239772107]
This paper introduces a novel human-LLM interaction framework, Low-code LLM.
It incorporates six types of simple low-code visual programming interactions to achieve more controllable and stable responses.
We highlight three advantages of the low-code LLM: user-friendly interaction, controllable generation, and wide applicability.
arXiv Detail & Related papers (2023-04-17T09:27:40Z) - Check Your Facts and Try Again: Improving Large Language Models with
External Knowledge and Automated Feedback [127.75419038610455]
Large language models (LLMs) are able to generate human-like, fluent responses for many downstream tasks.
This paper proposes a LLM-Augmenter system, which augments a black-box LLM with a set of plug-and-play modules.
arXiv Detail & Related papers (2023-02-24T18:48:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.