How a Bit Becomes a Story: Semantic Steering via Differentiable Fault Injection
- URL: http://arxiv.org/abs/2512.14715v1
- Date: Tue, 09 Dec 2025 04:04:19 GMT
- Title: How a Bit Becomes a Story: Semantic Steering via Differentiable Fault Injection
- Authors: Zafaryab Haider, Md Hafizur Rahman, Shane Moeykens, Vijay Devabhaktuni, Prabuddha Chakraborty,
- Abstract summary: This work investigates how low-level, bitwise perturbations (fault injection) can influence the semantic meaning of its generated descriptions.<n>In image captioning models, a single flipped bit might subtly alter how visual features map to words, shifting the entire narrative an AI tells about the world.<n>We design a differentiable fault analysis framework, BLADE, that uses gradient-based sensitivity estimation to locate semantically critical bits.
- Score: 1.690922615975256
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Hard-to-detect hardware bit flips, from either malicious circuitry or bugs, have already been shown to make transformers vulnerable in non-generative tasks. This work, for the first time, investigates how low-level, bitwise perturbations (fault injection) to the weights of a large language model (LLM) used for image captioning can influence the semantic meaning of its generated descriptions while preserving grammatical structure. While prior fault analysis methods have shown that flipping a few bits can crash classifiers or degrade accuracy, these approaches overlook the semantic and linguistic dimensions of generative systems. In image captioning models, a single flipped bit might subtly alter how visual features map to words, shifting the entire narrative an AI tells about the world. We hypothesize that such semantic drifts are not random but differentiably estimable. That is, the model's own gradients can predict which bits, if perturbed, will most strongly influence meaning while leaving syntax and fluency intact. We design a differentiable fault analysis framework, BLADE (Bit-level Fault Analysis via Differentiable Estimation), that uses gradient-based sensitivity estimation to locate semantically critical bits and then refines their selection through a caption-level semantic-fluency objective. Our goal is not merely to corrupt captions, but to understand how meaning itself is encoded, distributed, and alterable at the bit level, revealing that even imperceptible low-level changes can steer the high-level semantics of generative vision-language models. It also opens pathways for robustness testing, adversarial defense, and explainable AI, by exposing how structured bit-level faults can reshape a model's semantic output.
Related papers
- Beyond surface form: A pipeline for semantic analysis in Alzheimer's Disease detection from spontaneous speech [4.447462467582385]
Alzheimer's Disease (AD) is a progressive neurodegenerative condition that adversely affects cognitive abilities.<n>Language models show promise as a basis for screening tools for AD, but their limited interpretability poses a challenge.<n>We introduce a novel approach where texts surface forms are transformed by altering syntax and vocabulary while preserving semantic content.
arXiv Detail & Related papers (2025-12-15T18:59:49Z) - TRUST: Leveraging Text Robustness for Unsupervised Domain Adaptation [9.906359339999039]
We introduce a novel UDA approach that exploits the robustness of the language modality to guide the adaptation of a vision model.<n>We propose a multimodal soft-contrastive learning loss that aligns the vision and language feature spaces.<n>Our approach outperforms previous methods, setting the new state-of-the-art on classical (DomainNet) and complex (GeoNet) domain shifts.
arXiv Detail & Related papers (2025-08-08T16:51:44Z) - Detect Changes like Humans: Incorporating Semantic Priors for Improved Change Detection [52.62459671461816]
This paper explores incorporating semantic priors from visual foundation models to improve the ability to detect changes.<n>Inspired by the human visual paradigm, a novel dual-stream feature decoder is derived to distinguish changes by combining semantic-aware features and difference-aware features.
arXiv Detail & Related papers (2024-12-22T08:27:15Z) - Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective [50.261681681643076]
We propose a novel metric called SemVarEffect and a benchmark named SemVarBench to evaluate the causality between semantic variations in inputs and outputs in text-to-image synthesis.<n>Our work establishes an effective evaluation framework that advances the T2I synthesis community's exploration of human instruction understanding.
arXiv Detail & Related papers (2024-10-14T08:45:35Z) - Semantic-Syntactic Discrepancy in Images (SSDI): Learning Meaning and Order of Features from Natural Images [7.148054923510877]
We propose the concept of "image grammar", comprising "image semantics" and "image syntax"<n>We present a semi-supervised two-stage method for learning the image grammar of visual elements and environments solely from natural images.<n>The efficacy of the proposed approach is then demonstrated by achieving detection rates ranging from 70% to 90% on corruptions generated from CelebA and SUN-RGBD datasets.
arXiv Detail & Related papers (2024-01-31T00:16:02Z) - Interpretability at Scale: Identifying Causal Mechanisms in Alpaca [62.65877150123775]
We use Boundless DAS to efficiently search for interpretable causal structure in large language models while they follow instructions.
Our findings mark a first step toward faithfully understanding the inner-workings of our ever-growing and most widely deployed language models.
arXiv Detail & Related papers (2023-05-15T17:15:40Z) - ContraFeat: Contrasting Deep Features for Semantic Discovery [102.4163768995288]
StyleGAN has shown strong potential for disentangled semantic control.
Existing semantic discovery methods on StyleGAN rely on manual selection of modified latent layers to obtain satisfactory manipulation results.
We propose a model that automates this process and achieves state-of-the-art semantic discovery performance.
arXiv Detail & Related papers (2022-12-14T15:22:13Z) - Neural String Edit Distance [77.72325513792981]
We propose the neural string edit distance model for string-pair classification and sequence generation.
We modify the original expectation-maximization learned edit distance algorithm into a differentiable loss function.
We show that we can trade off between performance and interpretability in a single framework.
arXiv Detail & Related papers (2021-04-16T22:16:47Z) - Logic Constrained Pointer Networks for Interpretable Textual Similarity [11.142649867439406]
We introduce a novel pointer network based model with a sentinel gating function to align constituent chunks.
We improve this base model with a loss function to equally penalize misalignments in both sentences, ensuring the alignments are bidirectional.
The model achieves an F1 score of 97.73 and 96.32 on the benchmark SemEval datasets for the chunk alignment task.
arXiv Detail & Related papers (2020-07-15T13:01:44Z) - Differentiable Language Model Adversarial Attacks on Categorical
Sequence Classifiers [0.0]
An adversarial attack paradigm explores various scenarios for the vulnerability of deep learning models.
We use a fine-tuning of a language model for adversarial attacks as a generator of adversarial examples.
Our model works for diverse datasets on bank transactions, electronic health records, and NLP datasets.
arXiv Detail & Related papers (2020-06-19T11:25:36Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.