Related papers: DoPE: Decoy Oriented Perturbation Encapsulation Human-Readable, AI-Hostile Documents for Academic Integrity

DoPE: Decoy Oriented Perturbation Encapsulation Human-Readable, AI-Hostile Documents for Academic Integrity

URL: http://arxiv.org/abs/2601.12505v1
Date: Sun, 18 Jan 2026 17:34:29 GMT
Title: DoPE: Decoy Oriented Perturbation Encapsulation Human-Readable, AI-Hostile Documents for Academic Integrity
Authors: Ashish Raj Shekhar, Shiven Agarwal, Priyanuj Bordoloi, Yash Shah, Tejas Anvekar, Vivek Gupta,
Abstract summary: DoPE is a document-layer defense framework that embeds semantic decoys into PDF/ HTML assessments.<n>FewSoRT-Q generates question-level semantic decoys and FewSoRT-D to encapsulate them into watermarked documents.<n>DoPE yields strong empirical gains against black-box MLLMs from OpenAI and Anthropic.
Score: 10.808479217513181
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multimodal Large Language Models (MLLMs) can directly consume exam documents, threatening conventional assessments and academic integrity. We present DoPE (Decoy-Oriented Perturbation Encapsulation), a document-layer defense framework that embeds semantic decoys into PDF/HTML assessments to exploit render-parse discrepancies in MLLM pipelines. By instrumenting exams at authoring time, DoPE provides model-agnostic prevention (stop or confound automated solving) and detection (flag blind AI reliance) without relying on conventional one-shot classifiers. We formalize prevention and detection tasks, and introduce FewSoRT-Q, an LLM-guided pipeline that generates question-level semantic decoys and FewSoRT-D to encapsulate them into watermarked documents. We evaluate on Integrity-Bench, a novel benchmark of 1826 exams (PDF+HTML) derived from public QA datasets and OpenCourseWare. Against black-box MLLMs from OpenAI and Anthropic, DoPE yields strong empirical gains: a 91.4% detection rate at an 8.7% false-positive rate using an LLM-as-Judge verifier, and prevents successful completion or induces decoy-aligned failures in 96.3% of attempts. We release Integrity-Bench, our toolkit, and evaluation code to enable reproducible study of document-layer defenses for academic integrity.

Related papers

Integrity Shield A System for Ethical AI Use & Authorship Transparency in Assessments [10.808479217513181]
We present Integrity Shield, a document-layer watermarking system that embeds item-level watermarks into assessment PDFs.<n>These watermarks consistently prevent MLLMs from answering shielded exam PDFs and encode stable, item-level signatures.<n>Our demo showcases an interactive interface where instructors upload an exam, preview watermark behavior, and inspect pre/post AI performance & authorship evidence.
arXiv Detail & Related papers (2026-01-16T08:44:58Z)
BadScientist: Can a Research Agent Write Convincing but Unsound Papers that Fool LLM Reviewers? [21.78901120638025]
We investigate whether fabrication-oriented paper generation agents can deceive multi-model LLM review systems.<n>Our generator employs presentation-manipulation strategies requiring no real experiments.<n>Despite provably sound aggregation mathematics, integrity checking systematically fails.
arXiv Detail & Related papers (2025-10-20T18:37:11Z)
MCPEval: Automatic MCP-based Deep Evaluation for AI Agent Models [76.72220653705679]
We introduce MCPEval, an open-source framework that automates end-to-end task generation and deep evaluation of intelligent agents.<n> MCPEval standardizes metrics, seamlessly integrates with native agent tools, and eliminates manual effort in building evaluation pipelines.<n> Empirical results across five real-world domains show its effectiveness in revealing nuanced, domain-specific performance.
arXiv Detail & Related papers (2025-07-17T05:46:27Z)
The Feasibility of Topic-Based Watermarking on Academic Peer Reviews [46.71493672772134]
We evaluate topic-based watermarking (TBW) for large language models (LLMs)<n>Our results show that TBW maintains review quality relative to non-watermarked outputs, while demonstrating strong to paraphrasing-based evasion.
arXiv Detail & Related papers (2025-05-27T18:09:27Z)
Helping Large Language Models Protect Themselves: An Enhanced Filtering and Summarization System [2.0257616108612373]
Large Language Models are vulnerable to adversarial assaults, manipulative prompts, and encoded malicious inputs.<n>This study presents a unique defense paradigm that allows LLMs to recognize, filter, and defend against adversarial or malicious inputs on their own.
arXiv Detail & Related papers (2025-05-02T14:42:26Z)
Everything You Wanted to Know About LLM-based Vulnerability Detection But Were Afraid to Ask [30.819697001992154]
Large Language Models are a promising tool for automated vulnerability detection.<n>Despite widespread adoption, a critical question remains: Are LLMs truly effective at detecting real-world vulnerabilities?<n>This paper challenges three widely held community beliefs: that LLMs are (i) unreliable, (ii) insensitive to code patches, and (iii) performance-plateaued across model scales.
arXiv Detail & Related papers (2025-04-18T05:32:47Z)
Shh, don't say that! Domain Certification in LLMs [124.61851324874627]
Large language models (LLMs) are often deployed to perform constrained tasks, with narrow domains.<n>We introduce domain certification; a guarantee that accurately characterizes the out-of-domain behavior of language models.<n>We then propose a simple yet effective approach, which we call VALID that provides adversarial bounds as a certificate.
arXiv Detail & Related papers (2025-02-26T17:13:19Z)
Towards Copyright Protection for Knowledge Bases of Retrieval-augmented Language Models via Reasoning [58.57194301645823]
Large language models (LLMs) are increasingly integrated into real-world personalized applications.<n>The valuable and often proprietary nature of the knowledge bases used in RAG introduces the risk of unauthorized usage by adversaries.<n>Existing methods that can be generalized as watermarking techniques to protect these knowledge bases typically involve poisoning or backdoor attacks.<n>We propose name for harmless' copyright protection of knowledge bases.
arXiv Detail & Related papers (2025-02-10T09:15:56Z)
AutoPT: How Far Are We from the End2End Automated Web Penetration Testing? [54.65079443902714]
We introduce AutoPT, an automated penetration testing agent based on the principle of PSM driven by LLMs. Our results show that AutoPT outperforms the baseline framework ReAct on the GPT-4o mini model.
arXiv Detail & Related papers (2024-11-02T13:24:30Z)
SeqXGPT: Sentence-Level AI-Generated Text Detection [62.3792779440284]
We introduce a sentence-level detection challenge by synthesizing documents polished with large language models (LLMs) We then propose textbfSequence textbfX (Check) textbfGPT, a novel method that utilizes log probability lists from white-box LLMs as features for sentence-level AIGT detection.
arXiv Detail & Related papers (2023-10-13T07:18:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.