Related papers: Data-Oblivious ML Accelerators using Hardware Security Extensions

Data-Oblivious ML Accelerators using Hardware Security Extensions

URL: http://arxiv.org/abs/2401.16583v1
Date: Mon, 29 Jan 2024 21:34:29 GMT
Title: Data-Oblivious ML Accelerators using Hardware Security Extensions
Authors: Hossam ElAtali, John Z. Jekel, Lachlan J. Gunn, N. Asokan,
Abstract summary: Outsourced computation can put client data confidentiality at risk. We develop Dolma, which applies DIFT to the Gemmini matrix multiplication accelerator. We show that Dolma incurs low overheads for large configurations.
Score: 9.716425897388875
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Outsourced computation can put client data confidentiality at risk. Existing solutions are either inefficient or insufficiently secure: cryptographic techniques like fully-homomorphic encryption incur significant overheads, even with hardware assistance, while the complexity of hardware-assisted trusted execution environments has been exploited to leak secret data. Recent proposals such as BliMe and OISA show how dynamic information flow tracking (DIFT) enforced in hardware can protect client data efficiently. They are designed to protect CPU-only workloads. However, many outsourced computing applications, like machine learning, make extensive use of accelerators. We address this gap with Dolma, which applies DIFT to the Gemmini matrix multiplication accelerator, efficiently guaranteeing client data confidentiality, even in the presence of malicious/vulnerable software and side channel attacks on the server. We show that accelerators can allow DIFT logic optimizations that significantly reduce area overhead compared with general-purpose processor architectures. Dolma is integrated with the BliMe framework to achieve end-to-end security guarantees. We evaluate Dolma on an FPGA using a ResNet-50 DNN model and show that it incurs low overheads for large configurations ($4.4\%$, $16.7\%$, $16.5\%$ for performance, resource usage and power, respectively, with a 32x32 configuration).

Related papers

BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity [66.94629945519125]
We introduce a novel MoE architecture, BlockFFN, as well as its efficient training and deployment techniques.<n>Specifically, we use a router integrating ReLU activation and RMSNorm for differentiable and flexible routing.<n>Next, to promote both token-level sparsity (TLS) and chunk-level sparsity ( CLS), CLS-aware training objectives are designed, making BlockFFN more acceleration-friendly.
arXiv Detail & Related papers (2025-07-11T17:28:56Z)
Toward a Lightweight, Scalable, and Parallel Secure Encryption Engine [0.0]
SPiME is a lightweight, scalable, and FPGA-compatible Secure Processor-in-Memory Encryption architecture.<n>It integrates the Advanced Encryption Standard (AES-128) directly into a Processing-in-Memory framework.<n>It delivers over 25Gbps in sustained encryption throughput with predictable, low-latency performance.
arXiv Detail & Related papers (2025-06-18T02:25:04Z)
ABC-FHE : A Resource-Efficient Accelerator Enabling Bootstrappable Parameters for Client-Side Fully Homomorphic Encryption [0.8795040582681392]
Homomorphic encryption (FHE) enables continuous computation on encrypted data.<n>Recent advancements in FHE accelerators have successfully improved server-side performance, but client-side computations remain a bottleneck.<n>We propose ABC-FHE, an area- and power-efficient FHE accelerator that supports bootstrappable parameters on the client side.
arXiv Detail & Related papers (2025-06-10T05:37:31Z)
Confidential Serverless Computing [1.7231099917090071]
We present Hacher, a confidential computing system for secure serverless deployments. By employing nested confidential execution and a decoupled guest OS within CVMs, Hacher runs each function in a minimal "trustlet" Compared to CVM-based deployments, Hacher has 4.3x smaller TCB, improves end-to-end latency (15-93%), higher function density (up to 907x) and reduces inter-function communication (up to 27x) and function chaining latency (16.7-30.2x)
arXiv Detail & Related papers (2025-04-30T11:13:52Z)
CAT: A GPU-Accelerated FHE Framework with Its Application to High-Precision Private Dataset Query [0.51795041186793]
We introduce an open-source GPU-accelerated fully homomorphic encryption (FHE) framework CAT. emphCAT features a three-layer architecture: a foundation of core math, a bridge of pre-computed elements and combined operations, and an API-accessible layer of FHE operators. Based on our framework, we implement three widely used FHE schemes: CKKS, BFV, and BGV.
arXiv Detail & Related papers (2025-03-28T08:20:18Z)
Iterative Self-Tuning LLMs for Enhanced Jailbreaking Capabilities [63.603861880022954]
We introduce ADV-LLM, an iterative self-tuning process that crafts adversarial LLMs with enhanced jailbreak ability. Our framework significantly reduces the computational cost of generating adversarial suffixes while achieving nearly 100% ASR on various open-source LLMs. It exhibits strong attack transferability to closed-source models, achieving 99% ASR on GPT-3.5 and 49% ASR on GPT-4, despite being optimized solely on Llama3.
arXiv Detail & Related papers (2024-10-24T06:36:12Z)
HASS: Hardware-Aware Sparsity Search for Dataflow DNN Accelerator [47.66463010685586]
We propose a novel approach to exploit unstructured weights and activations sparsity for dataflow accelerators, using software and hardware co-optimization. We achieve an efficiency improvement ranging from 1.3$times$ to 4.2$times$ compared to existing sparse designs.
arXiv Detail & Related papers (2024-06-05T09:25:18Z)
Assessing the Performance of OpenTitan as Cryptographic Accelerator in Secure Open-Hardware System-on-Chips [4.635794094881707]
OpenTitan is an open-source silicon root-of-trust designed to be deployed in a wide range of systems. There has been no accurate and quantitative establishment of the benefits derived from using OpenTitan as a secure accelerator. This paper addresses this gap by thoroughly analysing strengths and inefficiencies when offloading cryptographic workloads to OpenTitan.
arXiv Detail & Related papers (2024-02-16T01:35:40Z)
HasTEE+ : Confidential Cloud Computing and Analytics with Haskell [50.994023665559496]
Confidential computing enables the protection of confidential code and data in a co-tenanted cloud deployment using specialized hardware isolation units called Trusted Execution Environments (TEEs) TEEs offer low-level C/C++-based toolchains that are susceptible to inherent memory safety vulnerabilities and lack language constructs to monitor explicit and implicit information-flow leaks. We address the above with HasTEE+, a domain-specific language (cla) embedded in Haskell that enables programming TEEs in a high-level language with strong type-safety.
arXiv Detail & Related papers (2024-01-17T00:56:23Z)
FHEmem: A Processing In-Memory Accelerator for Fully Homomorphic Encryption [9.884698447131374]
Homomorphic Encryption (FHE) is a technique that allows arbitrary computations to be performed on encrypted data without the need for decryption. FHE is significantly slower than computation on plain data due to the increase in data size after encryption. We propose a PIM-based FHE accelerator, FHEmem, which exploits a novel processing in-memory architecture.
arXiv Detail & Related papers (2023-11-27T20:11:38Z)
Secure Instruction and Data-Level Information Flow Tracking Model for RISC-V [0.0]
Unauthorized access, fault injection, and privacy invasion are potential threats from untrusted actors. We propose an integrated Information Flow Tracking (IFT) technique to enable runtime security to protect system integrity. This study proposes a multi-level IFT model that integrates a hardware-based IFT technique with a gate-level-based IFT (GLIFT) technique.
arXiv Detail & Related papers (2023-11-17T02:04:07Z)
SOCI^+: An Enhanced Toolkit for Secure OutsourcedComputation on Integers [50.608828039206365]
We propose SOCI+ which significantly improves the performance of SOCI. SOCI+ employs a novel (2, 2)-threshold Paillier cryptosystem with fast encryption and decryption as its cryptographic primitive. Compared with SOCI, our experimental evaluation shows that SOCI+ is up to 5.4 times more efficient in computation and 40% less in communication overhead.
arXiv Detail & Related papers (2023-09-27T05:19:32Z)
GME: GPU-based Microarchitectural Extensions to Accelerate Homomorphic Encryption [33.87964584665433]
Homomorphic Encryption (FHE) enables the processing of encrypted data without decrypting it. FHE introduces a slowdown of up to five orders of magnitude as compared to the same computation using plaintext data. We propose GME, which combines three key microarchitectural extensions along with a compile-time optimization to the current AMD CDNA GPU architecture.
arXiv Detail & Related papers (2023-09-20T01:50:43Z)
FusionAI: Decentralized Training and Deploying LLMs with Massive Consumer-Level GPUs [57.12856172329322]
We envision a decentralized system unlocking the potential vast untapped consumer-level GPU. This system faces critical challenges, including limited CPU and GPU memory, low network bandwidth, the variability of peer and device heterogeneity.
arXiv Detail & Related papers (2023-09-03T13:27:56Z)
An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices. We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations. Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z)
Perun: Secure Multi-Stakeholder Machine Learning Framework with GPU Support [1.5362025549031049]
Perun is a framework for confidential multi-stakeholder machine learning. It executes ML training on hardware accelerators (e.g., GPU) while providing security guarantees. During the ML training on CIFAR-10 and real-world medical datasets, Perun achieved a 161x to 1560x speedup.
arXiv Detail & Related papers (2021-03-31T08:31:07Z)
PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space. With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.