CiFHER: A Chiplet-Based FHE Accelerator with a Resizable Structure
- URL: http://arxiv.org/abs/2308.04890v3
- Date: Mon, 1 Apr 2024 02:45:41 GMT
- Title: CiFHER: A Chiplet-Based FHE Accelerator with a Resizable Structure
- Authors: Sangpyo Kim, Jongmin Kim, Jaeyoung Choi, Jung Ho Ahn,
- Abstract summary: Homomorphic encryption (FHE) is a definitive solution for privacy, but the high computational overhead of FHE poses a challenge to its practical adoption.
We propose CiFHER, a chiplet-based FHE accelerator with a resizable structure.
This study demonstrates that a CiFHER package composed of a number of compact chiplets provides performance comparable to state-of-the-art monolithic ASIC accelerators.
- Score: 5.0817812294893
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fully homomorphic encryption (FHE) is in the spotlight as a definitive solution for privacy, but the high computational overhead of FHE poses a challenge to its practical adoption. Although prior studies have attempted to design ASIC accelerators to mitigate the overhead, their designs require excessive chip resources (e.g., areas) to contain and process massive data for FHE operations. We propose CiFHER, a chiplet-based FHE accelerator with a resizable structure, to tackle the challenge with a cost-effective multi-chip module (MCM) design. First, we devise a flexible core architecture whose configuration is adjustable to conform to the global organization of chiplets and design constraints. Its distinctive feature is a composable functional unit providing varying computational throughput for the number-theoretic transform, the most dominant function in FHE. Then, we establish generalized data mapping methodologies to minimize the interconnect overhead when organizing the chips into the MCM package in a tiled manner, which becomes a significant bottleneck due to the packaging constraints. This study demonstrates that a CiFHER package composed of a number of compact chiplets provides performance comparable to state-of-the-art monolithic ASIC accelerators while significantly reducing the package-wide power consumption and manufacturing cost.
Related papers
- Resource Management for Low-latency Cooperative Fine-tuning of Foundation Models at the Network Edge [35.40849522296486]
Large-scale foundation models (FoMos) can perform human-like intelligence.
FoMos need to be adapted to specialized downstream tasks through fine-tuning techniques.
We advocate multi-device cooperation within the device-edge cooperative fine-tuning paradigm.
arXiv Detail & Related papers (2024-07-13T12:47:14Z) - Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers [58.5711048151424]
We introduce SPARSEK Attention, a novel sparse attention mechanism designed to overcome computational and memory obstacles.
Our approach integrates a scoring network and a differentiable top-k mask operator, SPARSEK, to select a constant number of KV pairs for each query.
Experimental results reveal that SPARSEK Attention outperforms previous sparse attention methods.
arXiv Detail & Related papers (2024-06-24T15:55:59Z) - Transforming Image Super-Resolution: A ConvFormer-based Efficient
Approach [63.98380888730723]
We introduce the Convolutional Transformer layer (ConvFormer) and the ConvFormer-based Super-Resolution network (CFSR)
CFSR efficiently models long-range dependencies and extensive receptive fields with a slight computational cost.
It achieves 0.39 dB gains on Urban100 dataset for x2 SR task while containing 26% and 31% fewer parameters and FLOPs, respectively.
arXiv Detail & Related papers (2024-01-11T03:08:00Z) - FHEmem: A Processing In-Memory Accelerator for Fully Homomorphic Encryption [9.884698447131374]
Homomorphic Encryption (FHE) is a technique that allows arbitrary computations to be performed on encrypted data without the need for decryption.
FHE is significantly slower than computation on plain data due to the increase in data size after encryption.
We propose a PIM-based FHE accelerator, FHEmem, which exploits a novel processing in-memory architecture.
arXiv Detail & Related papers (2023-11-27T20:11:38Z) - REED: Chiplet-Based Accelerator for Fully Homomorphic Encryption [4.713756093611972]
We present the first-of-its-kind multi-chiplet-based FHE accelerator REED' for overcoming the limitations of prior monolithic designs.
Results demonstrate that REED 2.5D microprocessor consumes 96.7 mm$2$ chip area, 49.4 W average power in 7nm technology.
arXiv Detail & Related papers (2023-08-05T14:04:39Z) - M3ICRO: Machine Learning-Enabled Compact Photonic Tensor Core based on
PRogrammable Multi-Operand Multimode Interference [18.0155410476884]
Photonic tensor core (PTC) designs based on standard optical components hinder scalability and compute density due to their large spatial footprint.
We propose an ultra-compact PTC using customized programmable multi-operand multimode interference (MOMMI) devices, named M3ICRO.
M3ICRO achieves a 3.4-9.6x smaller footprint, 1.6-4.4x higher speed, 10.6-42x higher compute density, 3.7-12x higher system throughput, and superior noise robustness.
arXiv Detail & Related papers (2023-05-31T02:34:36Z) - Can SAM Boost Video Super-Resolution? [78.29033914169025]
We propose a simple yet effective module -- SAM-guidEd refinEment Module (SEEM)
This light-weight plug-in module is specifically designed to leverage the attention mechanism for the generation of semantic-aware feature.
We apply our SEEM to two representative methods, EDVR and BasicVSR, resulting in consistently improved performance with minimal implementation effort.
arXiv Detail & Related papers (2023-05-11T02:02:53Z) - Transformer-based Context Condensation for Boosting Feature Pyramids in
Object Detection [77.50110439560152]
Current object detectors typically have a feature pyramid (FP) module for multi-level feature fusion (MFF)
We propose a novel and efficient context modeling mechanism that can help existing FPs deliver better MFF results.
In particular, we introduce a novel insight that comprehensive contexts can be decomposed and condensed into two types of representations for higher efficiency.
arXiv Detail & Related papers (2022-07-14T01:45:03Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - Union: A Unified HW-SW Co-Design Ecosystem in MLIR for Evaluating Tensor
Operations on Spatial Accelerators [4.055002321981825]
We present a HW-SW co-design ecosystem for spatial accelerators called Union.
Our framework allows exploring different algorithms and their mappings on several accelerator cost models.
We demonstrate the value of Union for the community with several case studies.
arXiv Detail & Related papers (2021-09-15T16:42:18Z) - Efficient Micro-Structured Weight Unification and Pruning for Neural
Network Compression [56.83861738731913]
Deep Neural Network (DNN) models are essential for practical applications, especially for resource limited devices.
Previous unstructured or structured weight pruning methods can hardly truly accelerate inference.
We propose a generalized weight unification framework at a hardware compatible micro-structured level to achieve high amount of compression and acceleration.
arXiv Detail & Related papers (2021-06-15T17:22:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.