Geometry-Aware Backdoor Attacks: Leveraging Curvature in Hyperbolic Embeddings
- URL: http://arxiv.org/abs/2510.06397v1
- Date: Tue, 07 Oct 2025 19:24:43 GMT
- Title: Geometry-Aware Backdoor Attacks: Leveraging Curvature in Hyperbolic Embeddings
- Authors: Ali Baheri,
- Abstract summary: Non-Euclidean foundation models place representations in curved spaces such as hyperbolic geometry.<n>Small input changes appear subtle to standard input-space detectors but produce disproportionately large shifts in the model's representation space.<n>We propose a geometry-adaptive trigger and evaluate it across tasks and architectures.
- Score: 3.8806403512213787
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Non-Euclidean foundation models increasingly place representations in curved spaces such as hyperbolic geometry. We show that this geometry creates a boundary-driven asymmetry that backdoor triggers can exploit. Near the boundary, small input changes appear subtle to standard input-space detectors but produce disproportionately large shifts in the model's representation space. Our analysis formalizes this effect and also reveals a limitation for defenses: methods that act by pulling points inward along the radius can suppress such triggers, but only by sacrificing useful model sensitivity in that same direction. Building on these insights, we propose a simple geometry-adaptive trigger and evaluate it across tasks and architectures. Empirically, attack success increases toward the boundary, whereas conventional detectors weaken, mirroring the theoretical trends. Together, these results surface a geometry-specific vulnerability in non-Euclidean models and offer analysis-backed guidance for designing and understanding the limits of defenses.
Related papers
- When Backdoors Go Beyond Triggers: Semantic Drift in Diffusion Models Under Encoder Attacks [2.4923006485141284]
We demonstrate that encoder-side poisoning induces persistent, trigger-free semantic corruption.<n> backdoors act as low-rank, target-centered deformations that amplify local sensitivity, causing distortion to propagate coherently across semantic neighborhoods.<n>Our findings, validated across diffusion and contrastive paradigms, expose the deep structural risks of encoder poisoning and highlight the necessity of geometric audits beyond simple attack success rates.
arXiv Detail & Related papers (2026-02-21T23:48:04Z) - Geometric Scaling of Bayesian Inference in LLMs [0.4779196219827507]
Recent work has shown that small transformers trained in controlled "wind-tunnel'' settings can implement exact Bayesian inference.<n>We investigate whether this geometric signature persists in production-grade language models.
arXiv Detail & Related papers (2025-12-27T05:29:55Z) - Angular Gradient Sign Method: Uncovering Vulnerabilities in Hyperbolic Networks [11.409989603679612]
Adversarial examples in neural networks have been extensively studied in Euclidean geometry.<n>Recent advances in textithyperbolic networks call for a reevaluation of attack strategies in non-Euclidean geometries.<n>We propose a novel adversarial attack that explicitly leverages the geometric properties of hyperbolic space.
arXiv Detail & Related papers (2025-11-17T05:16:07Z) - Exploiting Edge Features for Transferable Adversarial Attacks in Distributed Machine Learning [54.26807397329468]
This work explores a previously overlooked vulnerability in distributed deep learning systems.<n>An adversary who intercepts the intermediate features transmitted between them can still pose a serious threat.<n>We propose an exploitation strategy specifically designed for distributed settings.
arXiv Detail & Related papers (2025-07-09T20:09:00Z) - Geometry-Guided Adversarial Prompt Detection via Curvature and Local Intrinsic Dimension [10.892846618107392]
CurvaLID is a novel defence framework that efficiently detects adversarial prompts by leveraging their geometric properties.<n>CurvaLID builds on the geometric analysis of text prompts to uncover their underlying differences.<n>Our findings show that adversarial prompts exhibit distinct geometric signatures from benign prompts, enabling CurvaLID to achieve near-perfect classification.
arXiv Detail & Related papers (2025-03-05T13:47:53Z) - Decoder ensembling for learned latent geometries [15.484595752241122]
We show how to easily compute geodesics on the associated expected manifold.
We find this simple and reliable, thereby coming one step closer to easy-to-use latent geometries.
arXiv Detail & Related papers (2024-08-14T12:35:41Z) - BadGD: A unified data-centric framework to identify gradient descent vulnerabilities [10.996626204702189]
BadGD sets a new standard for understanding and mitigating adversarial manipulations.
This research underscores the severe threats posed by such data-centric attacks and highlights the urgent need for robust defenses in machine learning.
arXiv Detail & Related papers (2024-05-24T23:39:45Z) - Manifold Integrated Gradients: Riemannian Geometry for Feature Attribution [8.107199775668942]
Integrated Gradients (IG) is a prevalent feature attribution method for black-box deep learning models.
We address two predominant challenges associated with IG: the generation of noisy feature visualizations and the vulnerability to adversarial attributional attacks.
Our approach involves an adaptation of path-based feature attribution, aligning the path of attribution more closely to the intrinsic geometry of the data manifold.
arXiv Detail & Related papers (2024-05-16T04:13:17Z) - Hide in Thicket: Generating Imperceptible and Rational Adversarial
Perturbations on 3D Point Clouds [62.94859179323329]
Adrial attack methods based on point manipulation for 3D point cloud classification have revealed the fragility of 3D models.
We propose a novel shape-based adversarial attack method, HiT-ADV, which conducts a two-stage search for attack regions based on saliency and imperceptibility perturbation scores.
We propose that by employing benign resampling and benign rigid transformations, we can further enhance physical adversarial strength with little sacrifice to imperceptibility.
arXiv Detail & Related papers (2024-03-08T12:08:06Z) - Curve Your Attention: Mixed-Curvature Transformers for Graph
Representation Learning [77.1421343649344]
We propose a generalization of Transformers towards operating entirely on the product of constant curvature spaces.
We also provide a kernelized approach to non-Euclidean attention, which enables our model to run in time and memory cost linear to the number of nodes and edges.
arXiv Detail & Related papers (2023-09-08T02:44:37Z) - A Unifying and Canonical Description of Measure-Preserving Diffusions [60.59592461429012]
A complete recipe of measure-preserving diffusions in Euclidean space was recently derived unifying several MCMC algorithms into a single framework.
We develop a geometric theory that improves and generalises this construction to any manifold.
arXiv Detail & Related papers (2021-05-06T17:36:55Z) - GELATO: Geometrically Enriched Latent Model for Offline Reinforcement
Learning [54.291331971813364]
offline reinforcement learning approaches can be divided into proximal and uncertainty-aware methods.
In this work, we demonstrate the benefit of combining the two in a latent variational model.
Our proposed metrics measure both the quality of out of distribution samples as well as the discrepancy of examples in the data.
arXiv Detail & Related papers (2021-02-22T19:42:40Z) - Orthogonal Deep Models As Defense Against Black-Box Attacks [71.23669614195195]
We study the inherent weakness of deep models in black-box settings where the attacker may develop the attack using a model similar to the targeted model.
We introduce a novel gradient regularization scheme that encourages the internal representation of a deep model to be orthogonal to another.
We verify the effectiveness of our technique on a variety of large-scale models.
arXiv Detail & Related papers (2020-06-26T08:29:05Z) - Smoothed Geometry for Robust Attribution [36.616902063693104]
Feature attributions are a popular tool for explaining behavior of Deep Neural Networks (DNNs)
They have been shown to be vulnerable to attacks that produce divergent explanations for nearby inputs.
This lack of robustness is especially problematic in high-stakes applications where adversarially-manipulated explanations could impair safety and trustworthiness.
arXiv Detail & Related papers (2020-06-11T17:35:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.