Catching Contamination Before Generation: Spectral Kill Switches for Agents
- URL: http://arxiv.org/abs/2511.05804v1
- Date: Sat, 08 Nov 2025 02:24:05 GMT
- Title: Catching Contamination Before Generation: Spectral Kill Switches for Agents
- Authors: Valentin Noël,
- Abstract summary: We introduce a diagnostic that uses only the forward pass to emit a binary accept or reject signal during agent execution.<n>The method analyzes token graphs induced by attention and computes two spectral statistics in early layers.<n>We show that a single threshold on the high frequency energy ratio is optimal in the Bayes sense for detecting context inconsistency.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Agentic language models compose multi step reasoning chains, yet intermediate steps can be corrupted by inconsistent context, retrieval errors, or adversarial inputs, which makes post hoc evaluation too late because errors propagate before detection. We introduce a diagnostic that requires no additional training and uses only the forward pass to emit a binary accept or reject signal during agent execution. The method analyzes token graphs induced by attention and computes two spectral statistics in early layers, namely the high frequency energy ratio and spectral entropy. We formalize these signals, establish invariances, and provide finite sample estimators with uncertainty quantification. Under a two regime mixture assumption with a monotone likelihood ratio property, we show that a single threshold on the high frequency energy ratio is optimal in the Bayes sense for detecting context inconsistency. Empirically, the high frequency energy ratio exhibits robust bimodality during context verification across multiple model families, which enables gating decisions with overhead below one millisecond on our hardware and configurations. We demonstrate integration into retrieval augmented agent pipelines and discuss deployment as an inline safety monitor. The approach detects contamination while the model is still processing the text, before errors commit to the reasoning chain.
Related papers
- Diffusion Probe: Generated Image Result Prediction Using CNN Probes [33.97515945308048]
Text-to-image (T2I) diffusion models lack an efficient mechanism for early quality assessment.<n>We introduce Diffusion Probe, a framework that leverages internal cross-attention maps as predictive signals.<n>Diffusion Probe is model-agnostic, efficient, and broadly applicable, offering a practical solution for improving T2I generation efficiency.
arXiv Detail & Related papers (2026-02-27T08:24:47Z) - Plug-and-Play Diffusion Meets ADMM: Dual-Variable Coupling for Robust Medical Image Reconstruction [45.25461515976432]
Plug-and-Play diffusion prior (DP) frameworks have emerged as a powerful paradigm for imaging reconstruction.<n>We present a novel approach to resolving bias-hallucination trade-off, achieving state-of-the-art gradients with significantly accelerated convergence.
arXiv Detail & Related papers (2026-02-26T16:58:43Z) - Detecting and Mitigating Memorization in Diffusion Models through Anisotropy of the Log-Probability [9.133729396364952]
Diffusion-based image generative models produce high-fidelity images through iterative denoising but remain vulnerable to memorization.<n>Recent memorization detection methods are primarily based on the norm of score difference as indicators of memorization.<n>We develop a memorization detection metric by integrating isotropic norm and anisotropic alignment.
arXiv Detail & Related papers (2026-01-28T14:29:42Z) - Robust semi-parametric signal detection in particle physics with classifiers decorrelated via optimal transport [0.1565870461096057]
We use a signal-enrichment step to carry out a signal detection test on a signal-rich sample.<n>We show that the decorrelation procedure is robust to moderate background misspecification.<n>We conclude that decorrelation and signal enrichment help produce a stable, robust, valid, and more powerful test.
arXiv Detail & Related papers (2024-09-10T10:32:21Z) - Generative adversarial wavelet neural operator: Application to fault
detection and isolation of multivariate time series data [3.265784083548797]
This article proposes a generative adversarial wavelet neural operator (GAWNO) as a novel unsupervised deep learning approach for fault detection and isolation.
In the first stage, the GAWNO is trained on a dataset of normal operating conditions to learn the underlying data distribution.
In the second stage, a reconstruction error-based threshold approach is employed to detect and isolate faults based on the discrepancy values.
arXiv Detail & Related papers (2024-01-08T16:36:47Z) - Restoration-Degradation Beyond Linear Diffusions: A Non-Asymptotic
Analysis For DDIM-Type Samplers [90.45898746733397]
We develop a framework for non-asymptotic analysis of deterministic samplers used for diffusion generative modeling.
We show that one step along the probability flow ODE can be expressed as two steps: 1) a restoration step that runs ascent on the conditional log-likelihood at some infinitesimally previous time, and 2) a degradation step that runs the forward process using noise pointing back towards the current gradient.
arXiv Detail & Related papers (2023-03-06T18:59:19Z) - DensePure: Understanding Diffusion Models towards Adversarial Robustness [110.84015494617528]
We analyze the properties of diffusion models and establish the conditions under which they can enhance certified robustness.
We propose a new method DensePure, designed to improve the certified robustness of a pretrained model (i.e. a classifier)
We show that this robust region is a union of multiple convex sets, and is potentially much larger than the robust regions identified in previous works.
arXiv Detail & Related papers (2022-11-01T08:18:07Z) - ReDFeat: Recoupling Detection and Description for Multimodal Feature
Learning [51.07496081296863]
We recouple independent constraints of detection and description of multimodal feature learning with a mutual weighting strategy.
We propose a detector that possesses a large receptive field and is equipped with learnable non-maximum suppression layers.
We build a benchmark that contains cross visible, infrared, near-infrared and synthetic aperture radar image pairs for evaluating the performance of features in feature matching and image registration tasks.
arXiv Detail & Related papers (2022-05-16T04:24:22Z) - E-detectors: a nonparametric framework for sequential change detection [86.15115654324488]
We develop a fundamentally new and general framework for sequential change detection.
Our procedures come with clean, nonasymptotic bounds on the average run length.
We show how to design their mixtures in order to achieve both statistical and computational efficiency.
arXiv Detail & Related papers (2022-03-07T17:25:02Z) - Mitigating the Mutual Error Amplification for Semi-Supervised Object
Detection [92.52505195585925]
We propose a Cross Teaching (CT) method, aiming to mitigate the mutual error amplification by introducing a rectification mechanism of pseudo labels.
In contrast to existing mutual teaching methods that directly treat predictions from other detectors as pseudo labels, we propose the Label Rectification Module (LRM)
arXiv Detail & Related papers (2022-01-26T03:34:57Z) - Generalizing Face Forgery Detection with High-frequency Features [63.33397573649408]
Current CNN-based detectors tend to overfit to method-specific color textures and thus fail to generalize.
We propose to utilize the high-frequency noises for face forgery detection.
The first is the multi-scale high-frequency feature extraction module that extracts high-frequency noises at multiple scales.
The second is the residual-guided spatial attention module that guides the low-level RGB feature extractor to concentrate more on forgery traces from a new perspective.
arXiv Detail & Related papers (2021-03-23T08:19:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.