Related papers: GoldenTransformer: A Modular Fault Injection Framework for Transformer Robustness Research

GoldenTransformer: A Modular Fault Injection Framework for Transformer Robustness Research

URL: http://arxiv.org/abs/2509.10790v1
Date: Sat, 13 Sep 2025 02:52:08 GMT
Title: GoldenTransformer: A Modular Fault Injection Framework for Transformer Robustness Research
Authors: Luke Howard,
Abstract summary: We present GoldenTransformer, a fault injection framework to evaluate the resiliency of Large Language Models to induced hardware faults.<n>GoldenTransformer offers a unified Python-based platform for injecting diverse classes of faults into transformer-based models.<n>We detail the technical design and use of GoldenTransformer and demonstrate through several example experiments on classification and generation tasks.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Transformers have become the foundation for a wide range of state--of--the--art models across natural language processing, computer vision, and other machine learning domains. Despite their widespread deployment, the robustness of these models under fault conditions remains underexplored. We present GoldenTransformer, a modular and extensible fault injection framework designed to evaluate the resiliency of Large Language Models to induced hardware faults. GoldenTransformer offers a unified Python-based platform for injecting diverse classes of faults--such as weight corruption, activation injections, and attention--level disruptions--into pretrained transformer--based models. Inspired by the GoldenEye simulator for DNNs, our framework focuses on the unique challenges of working with large transformer architectures, including considerations such as structural complexity, latent dependencies, and nonuniform layer definitions. GoldenTransformer is built atop PyTorch and HuggingFace Transformers, and it supports experiment reproducibility, metric logging, and visualization out of the box. We detail the technical design and use of GoldenTransformer and demonstrate through several example experiments on classification and generation tasks. By enabling controlled injection of faults at multiple logical and structural points in a transformer, GoldenTransformer offers researchers and practitioners a valuable tool for model robustness analysis and for guiding dependable system design in real-world LLM applications.

Related papers

LISTA-Transformer Model Based on Sparse Coding and Attention Mechanism and Its Application in Fault Diagnosis [8.734812529767128]
We propose a sparse Transformer based on LISTA sparse encoding with visual Transformer to construct a model architecture with adaptive local and global feature collaboration mechanism.<n>On the CWRU dataset, the fault recognition rate of our method reached 98.5%, which is 3.3% higher than traditional methods and exhibits certain superiority over existing Transformer-based approaches.
arXiv Detail & Related papers (2026-03-04T15:00:07Z)
Distilled Transformers with Locally Enhanced Global Representations for Face Forgery Detection [48.263655122968906]
Face forgery detection (FFD) is devoted to detecting the authenticity of face images.<n>We propose a distilled transformer network (DTN) to capture both rich local and global forgery traces.
arXiv Detail & Related papers (2024-12-28T14:00:27Z)
A Review of Intelligent Device Fault Diagnosis Technologies Based on Machine Vision [0.0]
The paper details the structure, working principles, and benefits of Transformers, particularly their self-attention mechanism and parallel computation capabilities.<n>It highlights key Transformer model variants, such as Vision Transformers (ViT) and their extensions, which leverage self-attention to improve accuracy and efficiency in visual tasks.<n>Despite these advancements, challenges remain, including the reliance on extensive labeled datasets, significant computational demands, and difficulties in deploying models on resource-limited devices.
arXiv Detail & Related papers (2024-12-11T07:06:53Z)
Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models [6.809572275782338]
We develop a unified signal propagation theory and provide formulae that govern the moments of the forward and backward signal through the transformer model. Our framework can be used to understand and mitigate vanishing/exploding gradients, rank collapse, and instability associated with high attention scores.
arXiv Detail & Related papers (2024-03-14T17:59:14Z)
SimPLR: A Simple and Plain Transformer for Efficient Object Detection and Segmentation [49.65221743520028]
We show that shifting the multiscale inductive bias into the attention mechanism can work well, resulting in a plain detector SimPLR'<n>We find through our experiments that SimPLR with scale-aware attention is plain and simple architecture, yet competitive with multi-scale vision transformer alternatives.
arXiv Detail & Related papers (2023-10-09T17:59:26Z)
Foundation Transformers [105.06915886136524]
We call for the development of Foundation Transformer for true general-purpose modeling. In this work, we introduce a Transformer variant, named Magneto, to fulfill the goal.
arXiv Detail & Related papers (2022-10-12T17:16:27Z)
ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias [76.16156833138038]
We propose a novel Vision Transformer Advanced by Exploring intrinsic IB from convolutions, ie, ViTAE. ViTAE has several spatial pyramid reduction modules to downsample and embed the input image into tokens with rich multi-scale context. In each transformer layer, ViTAE has a convolution block in parallel to the multi-head self-attention module, whose features are fused and fed into the feed-forward network.
arXiv Detail & Related papers (2021-06-07T05:31:06Z)
Visformer: The Vision-friendly Transformer [105.52122194322592]
We propose a new architecture named Visformer, which is abbreviated from the Vision-friendly Transformer' With the same computational complexity, Visformer outperforms both the Transformer-based and convolution-based models in terms of ImageNet classification accuracy.
arXiv Detail & Related papers (2021-04-26T13:13:03Z)
Transformers Solve the Limited Receptive Field for Monocular Depth Prediction [82.90445525977904]
We propose TransDepth, an architecture which benefits from both convolutional neural networks and transformers. This is the first paper which applies transformers into pixel-wise prediction problems involving continuous labels.
arXiv Detail & Related papers (2021-03-22T18:00:13Z)
Transformer-based Conditional Variational Autoencoder for Controllable Story Generation [39.577220559911055]
We investigate large-scale latent variable models (LVMs) for neural story generation with objectives in two threads: generation effectiveness and controllability. We advocate to revive latent variable modeling, essentially the power of representation learning, in the era of Transformers. Specifically, we integrate latent representation vectors with a Transformer-based pre-trained architecture to build conditional variational autoencoder (CVAE)
arXiv Detail & Related papers (2021-01-04T08:31:11Z)
Toward Transformer-Based Object Detection [12.704056181392415]
Vision Transformers can be used as a backbone by a common detection task head to produce competitive COCO results. ViT-FRCNN demonstrates several known properties associated with transformers, including large pretraining capacity and fast fine-tuning performance. We view ViT-FRCNN as an important stepping stone toward a pure-transformer solution of complex vision tasks such as object detection.
arXiv Detail & Related papers (2020-12-17T22:33:14Z)
Efficient Transformers: A Survey [98.23264445730645]
Transformer model architectures have garnered immense interest lately due to their effectiveness across a range of domains like language, vision and reinforcement learning. This paper characterizes a large and thoughtful selection of recent efficiency-flavored "X-former" models.
arXiv Detail & Related papers (2020-09-14T20:38:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.