Related papers: FaRAccel: FPGA-Accelerated Defense Architecture for Efficient Bit-Flip Attack Resilience in Transformer Models

FaRAccel: FPGA-Accelerated Defense Architecture for Efficient Bit-Flip Attack Resilience in Transformer Models

URL: http://arxiv.org/abs/2510.24985v1
Date: Tue, 28 Oct 2025 21:27:09 GMT
Title: FaRAccel: FPGA-Accelerated Defense Architecture for Efficient Bit-Flip Attack Resilience in Transformer Models
Authors: Najmeh Nazari, Banafsheh Saber Latibari, Elahe Hosseini, Fatemeh Movafagh, Chongzhou Fang, Hosein Mohammadi Makrani, Kevin Immanuel Gubbi, Abhijit Mahalanobis, Setareh Rafatirad, Hossein Sayadi, Houman Homayoun,
Abstract summary: Forget and Rewire (FaR) methodology has demonstrated strong resilience against Bit-Flip Attacks (BFAs) on Transformer-based models.<n>We propose FaRAccel, a novel hardware accelerator architecture implemented on FPGA, specifically designed to offload and optimize FaR operations.<n>FaRAccel integrates reconfigurable logic for dynamic activation rerouting, and lightweight storage of rewiring configurations, enabling low-latency inference with minimal energy overhead.
Score: 7.085700272776079
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Forget and Rewire (FaR) methodology has demonstrated strong resilience against Bit-Flip Attacks (BFAs) on Transformer-based models by obfuscating critical parameters through dynamic rewiring of linear layers. However, the application of FaR introduces non-negligible performance and memory overheads, primarily due to the runtime modification of activation pathways and the lack of hardware-level optimization. To overcome these limitations, we propose FaRAccel, a novel hardware accelerator architecture implemented on FPGA, specifically designed to offload and optimize FaR operations. FaRAccel integrates reconfigurable logic for dynamic activation rerouting, and lightweight storage of rewiring configurations, enabling low-latency inference with minimal energy overhead. We evaluate FaRAccel across a suite of Transformer models and demonstrate substantial reductions in FaR inference latency and improvement in energy efficiency, while maintaining the robustness gains of the original FaR methodology. To the best of our knowledge, this is the first hardware-accelerated defense against BFAs in Transformers, effectively bridging the gap between algorithmic resilience and efficient deployment on real-world AI platforms.

Related papers

VAE-REPA: Variational Autoencoder Representation Alignment for Efficient Diffusion Training [53.09658039757408]
This paper proposes textbfnamex, a lightweight intrinsic guidance framework for efficient diffusion training.<n>name aligns the intermediate latent features of diffusion transformers with VAE features via a lightweight projection layer, supervised by a feature alignment loss.<n>Experiments demonstrate that name improves both generation quality and training convergence speed compared to vanilla diffusion transformers.
arXiv Detail & Related papers (2026-01-25T13:22:38Z)
NuRedact: Non-Uniform eFPGA Architecture for Low-Overhead and Secure IP Redaction [0.0]
This paper introduces NuRedact, the first full-custom eFPGA redaction framework that embraces architectural non-uniformity to balance security and efficiency.<n>From a security perspective, NuRedact fabrics are evaluated against state-of-the-art attack models, including SAT-based, cyclic, and sequential variants, and show enhanced resilience while maintaining practical design overheads.
arXiv Detail & Related papers (2026-01-16T20:55:30Z)
RMAAT: Astrocyte-Inspired Memory Compression and Replay for Efficient Long-Context Transformers [11.099872871193028]
This work explores computational principles derived from astrocytes-glial cells critical for biological memory and synaptic modulation.<n>We introduce the Recurrent Memory Augmented Transformer (RMAAT), an architecture integrating astrocyte functionalities.
arXiv Detail & Related papers (2026-01-01T18:34:06Z)
BitStopper: An Efficient Transformer Attention Accelerator via Stage-fusion and Early Termination [14.53308613746613]
BitStopper is a fine-grained algorithm-architecture co-design that operates without a sparsity predictor.<n>It achieves 2.03x and 1.89x speedups over Sanger and SOFA, respectively, while delivering 2.4x and 2.1x improvements in energy efficiency.
arXiv Detail & Related papers (2025-12-06T14:44:38Z)
Optimizing Neural Networks with Learnable Non-Linear Activation Functions via Lookup-Based FPGA Acceleration [17.92095380908621]
FPGA-based design achieves superior computational speed and over $104$ times higher energy efficiency compared to edge CPUs and GPU.<n>This breakthrough positions our approach as a practical enabler for energy-critical edge AI, where computational intensity and power constraints traditionally preclude the use of adaptive activation networks.
arXiv Detail & Related papers (2025-08-23T15:51:14Z)
RestoreVAR: Visual Autoregressive Generation for All-in-One Image Restoration [51.77917733024544]
latent diffusion models (LDMs) have improved the perceptual quality of All-in-One image Restoration (AiOR) methods.<n>LDMs suffer from slow inference due to their iterative denoising process, rendering them impractical for time-sensitive applications.<n>Visual autoregressive modeling ( VAR) performs scale-space autoregression and achieves comparable performance to that of state-of-the-art diffusion transformers.
arXiv Detail & Related papers (2025-05-23T15:52:26Z)
ReCoM: Realistic Co-Speech Motion Generation with Recurrent Embedded Transformer [58.49950218437718]
We present ReCoM, an efficient framework for generating high-fidelity and generalizable human body motions synchronized with speech.<n>The core innovation lies in the Recurrent Embedded Transformer (RET), which integrates Dynamic Embedding Regularization (DER) into a Vision Transformer (ViT) core architecture.<n>To enhance model robustness, we incorporate the proposed DER strategy, which equips the model with dual capabilities of noise resistance and cross-domain generalization.
arXiv Detail & Related papers (2025-03-27T16:39:40Z)
Explore Activation Sparsity in Recurrent LLMs for Energy-Efficient Neuromorphic Computing [3.379854610429579]
Recurrent Large Language Models (R-LLM) have proven effective in mitigating the complexity of self-attention.<n>We propose a low-cost, training-free algorithm to sparsify R-LLMs' activations to enhance energy efficiency on neuromorphic hardware.
arXiv Detail & Related papers (2025-01-09T19:13:03Z)
USEFUSE: Uniform Stride for Enhanced Performance in Fused Layer Architecture of Deep Neural Networks [0.6435156676256051]
This study presents the Sum-of-Products (SOP) units for convolution, which utilize low-latency left-to-right bit-serial arithmetic.<n>An effective mechanism detects and skips inefficient convolutions after ReLU layers, minimizing power consumption.<n>Two designs cater to varied demands: one focuses on minimal response time for mission-critical applications, and another focuses on resource-constrained devices with comparable latency.
arXiv Detail & Related papers (2024-12-18T11:04:58Z)
Re-Parameterization of Lightweight Transformer for On-Device Speech Emotion Recognition [10.302458835329539]
We introduce a new method, namely Transformer Re- parameterization, to boost the performance of lightweight Transformer models. Experimental results show that our proposed method consistently improves the performance of lightweight Transformers, even making them comparable to large models.
arXiv Detail & Related papers (2024-11-14T10:36:19Z)
TRANSPOSE: Transitional Approaches for Spatially-Aware LFI Resilient FSM Encoding [2.236957801565796]
Finite state machines (FSMs) regulate sequential circuits, including access to sensitive information and privileged CPU states. Laser-based fault injection (LFI) is becoming even more precise where an adversary can thwart chip security by altering individual flip-flop (FF) values.
arXiv Detail & Related papers (2024-11-05T04:18:47Z)
Transforming Image Super-Resolution: A ConvFormer-based Efficient Approach [58.57026686186709]
We introduce the Convolutional Transformer layer (ConvFormer) and propose a ConvFormer-based Super-Resolution network (CFSR) CFSR inherits the advantages of both convolution-based and transformer-based approaches. Experiments demonstrate that CFSR strikes an optimal balance between computational cost and performance.
arXiv Detail & Related papers (2024-01-11T03:08:00Z)
Learning a Fourier Transform for Linear Relative Positional Encodings in Transformers [71.32827362323205]
We propose a new class of linear Transformers calledLearner-Transformers (Learners) They incorporate a wide range of relative positional encoding mechanisms (RPEs) These include regular RPE techniques applied for sequential data, as well as novel RPEs operating on geometric data embedded in higher-dimensional Euclidean spaces.
arXiv Detail & Related papers (2023-02-03T18:57:17Z)
BSRT: Improving Burst Super-Resolution with Swin Transformer and Flow-Guided Deformable Alignment [84.82352123245488]
This work addresses the Burst Super-Resolution (BurstSR) task using a new architecture, which requires restoring a high-quality image from a sequence of noisy, misaligned, and low-resolution RAW bursts. We propose a Burst Super-Resolution Transformer (BSRT), which can significantly improve the capability of extracting inter-frame information and reconstruction. Our BSRT wins the championship in the NTIRE2022 Burst Super-Resolution Challenge.
arXiv Detail & Related papers (2022-04-18T14:23:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.