Related papers: Integrity Shield A System for Ethical AI Use & Authorship Transparency in Assessments

Integrity Shield A System for Ethical AI Use & Authorship Transparency in Assessments

URL: http://arxiv.org/abs/2601.11093v1
Date: Fri, 16 Jan 2026 08:44:58 GMT
Title: Integrity Shield A System for Ethical AI Use & Authorship Transparency in Assessments
Authors: Ashish Raj Shekhar, Shiven Agarwal, Priyanuj Bordoloi, Yash Shah, Tejas Anvekar, Vivek Gupta,
Abstract summary: We present Integrity Shield, a document-layer watermarking system that embeds item-level watermarks into assessment PDFs.<n>These watermarks consistently prevent MLLMs from answering shielded exam PDFs and encode stable, item-level signatures.<n>Our demo showcases an interactive interface where instructors upload an exam, preview watermark behavior, and inspect pre/post AI performance & authorship evidence.
Score: 10.808479217513181
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) can now solve entire exams directly from uploaded PDF assessments, raising urgent concerns about academic integrity and the reliability of grades and credentials. Existing watermarking techniques either operate at the token level or assume control over the model's decoding process, making them ineffective when students query proprietary black-box systems with instructor-provided documents. We present Integrity Shield, a document-layer watermarking system that embeds schema-aware, item-level watermarks into assessment PDFs while keeping their human-visible appearance unchanged. These watermarks consistently prevent MLLMs from answering shielded exam PDFs and encode stable, item-level signatures that can be reliably recovered from model or student responses. Across 30 exams spanning STEM, humanities, and medical reasoning, Integrity Shield achieves exceptionally high prevention (91-94% exam-level blocking) and strong detection reliability (89-93% signature retrieval) across four commercial MLLMs. Our demo showcases an interactive interface where instructors upload an exam, preview watermark behavior, and inspect pre/post AI performance & authorship evidence.

Related papers

DoPE: Decoy Oriented Perturbation Encapsulation Human-Readable, AI-Hostile Documents for Academic Integrity [10.808479217513181]
DoPE is a document-layer defense framework that embeds semantic decoys into PDF/ HTML assessments.<n>FewSoRT-Q generates question-level semantic decoys and FewSoRT-D to encapsulate them into watermarked documents.<n>DoPE yields strong empirical gains against black-box MLLMs from OpenAI and Anthropic.
arXiv Detail & Related papers (2026-01-18T17:34:29Z)
A Visual Semantic Adaptive Watermark grounded by Prefix-Tuning for Large Vision-Language Model [48.79816664229285]
VIsual Semantic Adaptive Watermark (VISA-Mark) is a novel framework that embeds detectable signals while strictly preserving visual fidelity.<n>Our approach employs a lightweight, efficiently trained prefix-tuner to extract dynamic Visual-Evidence Weights.<n> Empirical results confirm that VISA-Mark outperforms conventional methods with a 7.8% improvement in visual consistency.
arXiv Detail & Related papers (2026-01-12T07:55:13Z)
SEAL: Subspace-Anchored Watermarks for LLM Ownership [12.022506016268112]
We propose SEAL, a subspace-anchored watermarking framework for large language models.<n> SEAL embeds multi-bit signatures directly into the model's latent representational space, supporting both white-box and black-box verification scenarios.<n>We conduct comprehensive experiments on multiple benchmark datasets and six prominent LLMs to demonstrate SEAL's superior effectiveness, fidelity, efficiency, and robustness.
arXiv Detail & Related papers (2025-11-14T14:44:11Z)
SWAP: Towards Copyright Auditing of Soft Prompts via Sequential Watermarking [58.475471437150674]
We propose sequential watermarking for soft prompts (SWAP)<n>SWAP encodes watermarks through a specific order of defender-specified out-of-distribution classes.<n>Experiments on 11 datasets demonstrate SWAP's effectiveness, harmlessness, and robustness against potential adaptive attacks.
arXiv Detail & Related papers (2025-11-05T13:48:48Z)
SynthID-Image: Image watermarking at internet scale [55.5714762895087]
We introduce SynthID-Image, a deep learning-based system for invisibly watermarking AI-generated imagery.<n>This paper documents the technical desiderata, threat models, and practical challenges of deploying such a system at internet scale.
arXiv Detail & Related papers (2025-10-10T11:03:31Z)
LLM Watermark Evasion via Bias Inversion [24.543675977310357]
We propose the emphBias-Inversion Rewriting Attack (BIRA), which is theoretically motivated and model-agnostic.<n>BIRA weakens the watermark signal by suppressing the logits of likely watermarked tokens during rewriting, without any knowledge of the underlying watermarking scheme.
arXiv Detail & Related papers (2025-09-27T00:24:57Z)
In-Context Watermarks for Large Language Models [71.29952527565749]
In-Context Watermarking (ICW) embeds watermarks into generated text solely through prompt engineering.<n>We investigate four ICW strategies at different levels of granularity, each paired with a tailored detection method.<n>Our experiments validate the feasibility of ICW as a model-agnostic, practical watermarking approach.
arXiv Detail & Related papers (2025-05-22T17:24:51Z)
ClearMark: Intuitive and Robust Model Watermarking via Transposed Model Training [50.77001916246691]
This paper introduces ClearMark, the first DNN watermarking method designed for intuitive human assessment. ClearMark embeds visible watermarks, enabling human decision-making without rigid value thresholds. It shows an 8,544-bit watermark capacity comparable to the strongest existing work.
arXiv Detail & Related papers (2023-10-25T08:16:55Z)
Don't Forget to Sign the Gradients! [60.98885980669777]
GradSigns is a novel watermarking framework for deep neural networks (DNNs) We present GradSigns, a novel watermarking framework for deep neural networks (DNNs)
arXiv Detail & Related papers (2021-03-05T14:24:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.