Related papers: When Backdoors Go Beyond Triggers: Semantic Drift in Diffusion Models Under Encoder Attacks

When Backdoors Go Beyond Triggers: Semantic Drift in Diffusion Models Under Encoder Attacks

URL: http://arxiv.org/abs/2602.20193v1
Date: Sat, 21 Feb 2026 23:48:04 GMT
Title: When Backdoors Go Beyond Triggers: Semantic Drift in Diffusion Models Under Encoder Attacks
Authors: Shenyang Chen, Liuwan Zhu,
Abstract summary: We demonstrate that encoder-side poisoning induces persistent, trigger-free semantic corruption.<n> backdoors act as low-rank, target-centered deformations that amplify local sensitivity, causing distortion to propagate coherently across semantic neighborhoods.<n>Our findings, validated across diffusion and contrastive paradigms, expose the deep structural risks of encoder poisoning and highlight the necessity of geometric audits beyond simple attack success rates.
Score: 2.4923006485141284
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Standard evaluations of backdoor attacks on text-to-image (T2I) models primarily measure trigger activation and visual fidelity. We challenge this paradigm, demonstrating that encoder-side poisoning induces persistent, trigger-free semantic corruption that fundamentally reshapes the representation manifold. We trace this vulnerability to a geometric mechanism: a Jacobian-based analysis reveals that backdoors act as low-rank, target-centered deformations that amplify local sensitivity, causing distortion to propagate coherently across semantic neighborhoods. To rigorously quantify this structural degradation, we introduce SEMAD (Semantic Alignment and Drift), a diagnostic framework that measures both internal embedding drift and downstream functional misalignment. Our findings, validated across diffusion and contrastive paradigms, expose the deep structural risks of encoder poisoning and highlight the necessity of geometric audits beyond simple attack success rates.

Related papers

TraceGuard: Process-Guided Firewall against Reasoning Backdoors in Large Language Models [19.148124494194317]
We propose TraceGuard, a process-guided security framework that transforms small-scale models into robust reasoning firewalls.<n>Our approach treats the reasoning trace as an untrusted payload and establishes a defense-in-depth strategy.<n>We demonstrate robustness against adaptive adversaries in a grey-box setting, establishing TraceGuard as a viable, low-latency security primitive.
arXiv Detail & Related papers (2026-03-02T22:19:13Z)
Self-Aware Object Detection via Degradation Manifolds [3.8265249634979734]
In safety-critical settings, it is insufficient to produce predictions without assessing whether the input remains within the detector's nominal operating regime.<n>We introduce a degradation-aware self-awareness framework based on degradation manifold.<n>Our method augments a standard detection backbone with a lightweight embedding head trained via contrastive learning.
arXiv Detail & Related papers (2026-02-20T17:58:46Z)
Simulated Adoption: Decoupling Magnitude and Direction in LLM In-Context Conflict Resolution [3.0242762196828448]
Large Language Models (LLMs) frequently prioritize conflicting in-context information over pre-existing parametric memory.<n>We show that models do not "unlearn" or suppress the magnitude of internal truths but rather employ a mechanism of geometric displacement.
arXiv Detail & Related papers (2026-02-04T06:13:11Z)
Noise & pattern: identity-anchored Tikhonov regularization for robust structural anomaly detection [58.535473924035365]
Anomaly detection plays a pivotal role in automated industrial inspection, aiming to identify subtle or rare defects in otherwise uniform visual patterns.<n>We tackle structural anomaly detection using a self-supervised autoencoder that learns to repair corrupted inputs.<n>We introduce a corruption model that injects artificial disruptions into training images to mimic structural defects.
arXiv Detail & Related papers (2025-11-10T15:48:50Z)
Geometry-Aware Backdoor Attacks: Leveraging Curvature in Hyperbolic Embeddings [3.8806403512213787]
Non-Euclidean foundation models place representations in curved spaces such as hyperbolic geometry.<n>Small input changes appear subtle to standard input-space detectors but produce disproportionately large shifts in the model's representation space.<n>We propose a geometry-adaptive trigger and evaluate it across tasks and architectures.
arXiv Detail & Related papers (2025-10-07T19:24:43Z)
Generative Model Inversion Through the Lens of the Manifold Hypothesis [98.37040155914595]
Model inversion attacks (MIAs) aim to reconstruct class-representative samples from trained models.<n>Recent generative MIAs utilize generative adversarial networks to learn image priors that guide the inversion process.
arXiv Detail & Related papers (2025-09-24T14:39:25Z)
BURN: Backdoor Unlearning via Adversarial Boundary Analysis [73.14147934175604]
Backdoor unlearning aims to remove backdoor-related information while preserving the model's original functionality.<n>We propose Backdoor Unlearning via adversaRial bouNdary analysis (BURN), a novel defense framework that integrates false correlation decoupling, progressive data refinement, and model purification.
arXiv Detail & Related papers (2025-07-14T17:13:06Z)
Trigger without Trace: Towards Stealthy Backdoor Attack on Text-to-Image Diffusion Models [70.03122709795122]
Backdoor attacks targeting text-to-image diffusion models have advanced rapidly.<n>Current backdoor samples often exhibit two key abnormalities compared to benign samples.<n>We propose Trigger without Trace (TwT) by explicitly mitigating these consistencies.
arXiv Detail & Related papers (2025-03-22T10:41:46Z)
Spatial-Frequency Discriminability for Revealing Adversarial Perturbations [53.279716307171604]
Vulnerability of deep neural networks to adversarial perturbations has been widely perceived in the computer vision community. Current algorithms typically detect adversarial patterns through discriminative decomposition for natural and adversarial data. We propose a discriminative detector relying on a spatial-frequency Krawtchouk decomposition.
arXiv Detail & Related papers (2023-05-18T10:18:59Z)
Improving Adversarial Robustness to Sensitivity and Invariance Attacks with Deep Metric Learning [80.21709045433096]
A standard method in adversarial robustness assumes a framework to defend against samples crafted by minimally perturbing a sample. We use metric learning to frame adversarial regularization as an optimal transport problem. Our preliminary results indicate that regularizing over invariant perturbations in our framework improves both invariant and sensitivity defense.
arXiv Detail & Related papers (2022-11-04T13:54:02Z)
Self-Supervised Training with Autoencoders for Visual Anomaly Detection [61.62861063776813]
We focus on a specific use case in anomaly detection where the distribution of normal samples is supported by a lower-dimensional manifold. We adapt a self-supervised learning regime that exploits discriminative information during training but focuses on the submanifold of normal examples. We achieve a new state-of-the-art result on the MVTec AD dataset -- a challenging benchmark for visual anomaly detection in the manufacturing domain.
arXiv Detail & Related papers (2022-06-23T14:16:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.