DoPE: Decoy Oriented Perturbation Encapsulation Human-Readable, AI-Hostile Documents for Academic Integrity
- URL: http://arxiv.org/abs/2601.12505v1
- Date: Sun, 18 Jan 2026 17:34:29 GMT
- Title: DoPE: Decoy Oriented Perturbation Encapsulation Human-Readable, AI-Hostile Documents for Academic Integrity
- Authors: Ashish Raj Shekhar, Shiven Agarwal, Priyanuj Bordoloi, Yash Shah, Tejas Anvekar, Vivek Gupta,
- Abstract summary: DoPE is a document-layer defense framework that embeds semantic decoys into PDF/ HTML assessments.<n>FewSoRT-Q generates question-level semantic decoys and FewSoRT-D to encapsulate them into watermarked documents.<n>DoPE yields strong empirical gains against black-box MLLMs from OpenAI and Anthropic.
- Score: 10.808479217513181
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multimodal Large Language Models (MLLMs) can directly consume exam documents, threatening conventional assessments and academic integrity. We present DoPE (Decoy-Oriented Perturbation Encapsulation), a document-layer defense framework that embeds semantic decoys into PDF/HTML assessments to exploit render-parse discrepancies in MLLM pipelines. By instrumenting exams at authoring time, DoPE provides model-agnostic prevention (stop or confound automated solving) and detection (flag blind AI reliance) without relying on conventional one-shot classifiers. We formalize prevention and detection tasks, and introduce FewSoRT-Q, an LLM-guided pipeline that generates question-level semantic decoys and FewSoRT-D to encapsulate them into watermarked documents. We evaluate on Integrity-Bench, a novel benchmark of 1826 exams (PDF+HTML) derived from public QA datasets and OpenCourseWare. Against black-box MLLMs from OpenAI and Anthropic, DoPE yields strong empirical gains: a 91.4% detection rate at an 8.7% false-positive rate using an LLM-as-Judge verifier, and prevents successful completion or induces decoy-aligned failures in 96.3% of attempts. We release Integrity-Bench, our toolkit, and evaluation code to enable reproducible study of document-layer defenses for academic integrity.
Related papers
- Integrity Shield A System for Ethical AI Use & Authorship Transparency in Assessments [10.808479217513181]
We present Integrity Shield, a document-layer watermarking system that embeds item-level watermarks into assessment PDFs.<n>These watermarks consistently prevent MLLMs from answering shielded exam PDFs and encode stable, item-level signatures.<n>Our demo showcases an interactive interface where instructors upload an exam, preview watermark behavior, and inspect pre/post AI performance & authorship evidence.
arXiv Detail & Related papers (2026-01-16T08:44:58Z) - BadScientist: Can a Research Agent Write Convincing but Unsound Papers that Fool LLM Reviewers? [21.78901120638025]
We investigate whether fabrication-oriented paper generation agents can deceive multi-model LLM review systems.<n>Our generator employs presentation-manipulation strategies requiring no real experiments.<n>Despite provably sound aggregation mathematics, integrity checking systematically fails.
arXiv Detail & Related papers (2025-10-20T18:37:11Z) - MCPEval: Automatic MCP-based Deep Evaluation for AI Agent Models [76.72220653705679]
We introduce MCPEval, an open-source framework that automates end-to-end task generation and deep evaluation of intelligent agents.<n> MCPEval standardizes metrics, seamlessly integrates with native agent tools, and eliminates manual effort in building evaluation pipelines.<n> Empirical results across five real-world domains show its effectiveness in revealing nuanced, domain-specific performance.
arXiv Detail & Related papers (2025-07-17T05:46:27Z) - The Feasibility of Topic-Based Watermarking on Academic Peer Reviews [46.71493672772134]
We evaluate topic-based watermarking (TBW) for large language models (LLMs)<n>Our results show that TBW maintains review quality relative to non-watermarked outputs, while demonstrating strong to paraphrasing-based evasion.
arXiv Detail & Related papers (2025-05-27T18:09:27Z) - Helping Large Language Models Protect Themselves: An Enhanced Filtering and Summarization System [2.0257616108612373]
Large Language Models are vulnerable to adversarial assaults, manipulative prompts, and encoded malicious inputs.<n>This study presents a unique defense paradigm that allows LLMs to recognize, filter, and defend against adversarial or malicious inputs on their own.
arXiv Detail & Related papers (2025-05-02T14:42:26Z) - Everything You Wanted to Know About LLM-based Vulnerability Detection But Were Afraid to Ask [30.819697001992154]
Large Language Models are a promising tool for automated vulnerability detection.<n>Despite widespread adoption, a critical question remains: Are LLMs truly effective at detecting real-world vulnerabilities?<n>This paper challenges three widely held community beliefs: that LLMs are (i) unreliable, (ii) insensitive to code patches, and (iii) performance-plateaued across model scales.
arXiv Detail & Related papers (2025-04-18T05:32:47Z) - Shh, don't say that! Domain Certification in LLMs [124.61851324874627]
Large language models (LLMs) are often deployed to perform constrained tasks, with narrow domains.<n>We introduce domain certification; a guarantee that accurately characterizes the out-of-domain behavior of language models.<n>We then propose a simple yet effective approach, which we call VALID that provides adversarial bounds as a certificate.
arXiv Detail & Related papers (2025-02-26T17:13:19Z) - Towards Copyright Protection for Knowledge Bases of Retrieval-augmented Language Models via Reasoning [58.57194301645823]
Large language models (LLMs) are increasingly integrated into real-world personalized applications.<n>The valuable and often proprietary nature of the knowledge bases used in RAG introduces the risk of unauthorized usage by adversaries.<n>Existing methods that can be generalized as watermarking techniques to protect these knowledge bases typically involve poisoning or backdoor attacks.<n>We propose name for harmless' copyright protection of knowledge bases.
arXiv Detail & Related papers (2025-02-10T09:15:56Z) - AutoPT: How Far Are We from the End2End Automated Web Penetration Testing? [54.65079443902714]
We introduce AutoPT, an automated penetration testing agent based on the principle of PSM driven by LLMs.
Our results show that AutoPT outperforms the baseline framework ReAct on the GPT-4o mini model.
arXiv Detail & Related papers (2024-11-02T13:24:30Z) - SeqXGPT: Sentence-Level AI-Generated Text Detection [62.3792779440284]
We introduce a sentence-level detection challenge by synthesizing documents polished with large language models (LLMs)
We then propose textbfSequence textbfX (Check) textbfGPT, a novel method that utilizes log probability lists from white-box LLMs as features for sentence-level AIGT detection.
arXiv Detail & Related papers (2023-10-13T07:18:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.