Related papers: A Decision-Theoretic Formalisation of Steganography With Applications to LLM Monitoring

A Decision-Theoretic Formalisation of Steganography With Applications to LLM Monitoring

URL: http://arxiv.org/abs/2602.23163v1
Date: Thu, 26 Feb 2026 16:27:24 GMT
Title: A Decision-Theoretic Formalisation of Steganography With Applications to LLM Monitoring
Authors: Usman Anwar, Julianna Piskorz, David D. Baek, David Africa, Jim Weatherall, Max Tegmark, Christian Schroeder de Witt, Mihaela van der Schaar, David Krueger,
Abstract summary: We propose an alternative, textbfdecision-theoretic view of steganography.<n>Our central insight is that steganography creates an asymmetry in usable information between agents who can and cannot decode the hidden content.<n>We use this to define the textbfsteganographic gap -- a measure that quantifies steganography by comparing the downstream utility of the steganographic signal to agents that can and cannot decode the hidden content.
Score: 46.351075821275806
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models are beginning to show steganographic capabilities. Such capabilities could allow misaligned models to evade oversight mechanisms. Yet principled methods to detect and quantify such behaviours are lacking. Classical definitions of steganography, and detection methods based on them, require a known reference distribution of non-steganographic signals. For the case of steganographic reasoning in LLMs, knowing such a reference distribution is not feasible; this renders these approaches inapplicable. We propose an alternative, \textbf{decision-theoretic view of steganography}. Our central insight is that steganography creates an asymmetry in usable information between agents who can and cannot decode the hidden content (present within a steganographic signal), and this otherwise latent asymmetry can be inferred from the agents' observable actions. To formalise this perspective, we introduce generalised $\mathcal{V}$-information: a utilitarian framework for measuring the amount of usable information within some input. We use this to define the \textbf{steganographic gap} -- a measure that quantifies steganography by comparing the downstream utility of the steganographic signal to agents that can and cannot decode the hidden content. We empirically validate our formalism, and show that it can be used to detect, quantify, and mitigate steganographic reasoning in LLMs.

Related papers

NEST: Nascent Encoded Steganographic Thoughts [0.0]
This study explores the potential for steganographic reasoning to inform risk assessment and deployment policies.<n>We measure evasion, refusal rates, encoding fidelity, and hidden task accuracy across four datasets.<n>We find that current models cannot yet sustain hidden reasoning for complex math and arithmetic tasks.
arXiv Detail & Related papers (2026-02-15T11:05:18Z)
Semantic Visual Anomaly Detection and Reasoning in AI-Generated Images [96.43608872116347]
AnomReason is a large-scale benchmark with structured annotations as quadruple textbfAnomAgent<n>AnomReason and AnomAgent serve as a foundation for measuring and improving the semantic plausibility of AI-generated images.
arXiv Detail & Related papers (2025-10-11T14:09:24Z)
GALAX: Graph-Augmented Language Model for Explainable Reinforcement-Guided Subgraph Reasoning in Precision Medicine [31.561998419001124]
In precision medicine, quantitative multi-omic features, topological context, and textual biological knowledge play vital roles in identifying disease-critical signaling pathways and targets.<n>We propose GALAX, an innovative framework that integrates pretrained Graph Neural Networks (GNNs) into Large Language Models (LLMs)<n>As an application, we also introduced Target-QA, a benchmark combining CRISPR-identified targets, multi-omic profiles, and biomedical graph knowledge across diverse cancer cell lines.
arXiv Detail & Related papers (2025-09-25T09:20:58Z)
ForenX: Towards Explainable AI-Generated Image Detection with Multimodal Large Language Models [82.04858317800097]
We present ForenX, a novel method that not only identifies the authenticity of images but also provides explanations that resonate with human thoughts.<n>ForenX employs the powerful multimodal large language models (MLLMs) to analyze and interpret forensic cues.<n>We introduce ForgReason, a dataset dedicated to descriptions of forgery evidences in AI-generated images.
arXiv Detail & Related papers (2025-08-02T15:21:26Z)
Early Signs of Steganographic Capabilities in Frontier LLMs [7.3833268176766245]
Large Language Models could evade monitoring through steganography.<n>We focus on two types of steganography: passing encoded messages and performing encoded reasoning.<n>We find early signs that models can perform basic encoded reasoning in a simple state-tracking problem.
arXiv Detail & Related papers (2025-07-03T15:54:55Z)
The Steganographic Potentials of Language Models [0.0]
Large language models (LLMs) can hide messages within plain text (steganography)<n>We explore the steganographic capabilities of LLMs fine-tuned via reinforcement learning (RL)<n>Our findings reveal that while current models exhibit rudimentary steganographic abilities in terms of security and capacity, explicit algorithmic guidance markedly enhances their capacity for information concealment.
arXiv Detail & Related papers (2025-05-06T11:25:52Z)
Provably Secure Public-Key Steganography Based on Admissible Encoding [66.38591467056939]
The technique of hiding secret messages within seemingly harmless covertext is known as provably secure steganography (PSS)<n>PSS evolves from symmetric key steganography to public-key steganography, functioning without the requirement of a pre-shared key.<n>This paper proposes a more general elliptic curve public key steganography method based on admissible encoding.
arXiv Detail & Related papers (2025-04-28T03:42:25Z)
Natias: Neuron Attribution based Transferable Image Adversarial Steganography [62.906821876314275]
adversarial steganography has garnered considerable attention due to its ability to effectively deceive deep-learning-based steganalysis. We propose a novel adversarial steganographic scheme named Natias. Our proposed method can be seamlessly integrated with existing adversarial steganography frameworks.
arXiv Detail & Related papers (2024-09-08T04:09:51Z)
Provably Robust and Secure Steganography in Asymmetric Resource Scenario [30.12327233257552]
Current provably secure steganography approaches require a pair of encoder and decoder to hide and extract private messages. This paper proposes a novel provably robust and secure steganography framework for the asymmetric resource setting.
arXiv Detail & Related papers (2024-07-18T13:32:00Z)
SUDS: Sanitizing Universal and Dependent Steganography [4.067706508297839]
Steganography, or hiding messages in plain sight, is a form of information hiding that is most commonly used for covert communication. Current protection mechanisms rely upon steganalysis, but these approaches are dependent upon prior knowledge. This work focuses on a deep learning sanitization technique called SUDS that is able to sanitize universal and dependent steganography.
arXiv Detail & Related papers (2023-09-23T19:39:44Z)
Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust. Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model. We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z)
Assessing glaucoma in retinal fundus photographs using Deep Feature Consistent Variational Autoencoders [63.391402501241195]
glaucoma is challenging to detect since it remains asymptomatic until the symptoms are severe. Early identification of glaucoma is generally made based on functional, structural, and clinical assessments. Deep learning methods have partially solved this dilemma by bypassing the marker identification stage and analyzing high-level information directly to classify the data.
arXiv Detail & Related papers (2021-10-04T16:06:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.