AES-RV: Hardware-Efficient RISC-V Accelerator with Low-Latency AES Instruction Extension for IoT Security
- URL: http://arxiv.org/abs/2505.11880v1
- Date: Sat, 17 May 2025 07:15:36 GMT
- Title: AES-RV: Hardware-Efficient RISC-V Accelerator with Low-Latency AES Instruction Extension for IoT Security
- Authors: Van Tinh Nguyen, Phuc Hung Pham, Vu Trung Duong Le, Hoai Luan Pham, Tuan Hai Vu, Thi Diem Tran,
- Abstract summary: AES-RV is a hardware-efficient RISC-V accelerator featuring low-latency AES instruction extensions.<n>It achieves up to 255.97 times speedup and up to 453.04 times higher energy efficiency.
- Score: 0.0879626117219674
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The Advanced Encryption Standard (AES) is a widely adopted cryptographic algorithm essential for securing embedded systems and IoT platforms. However, existing AES hardware accelerators often face limitations in performance, energy efficiency, and flexibility. This paper presents AES-RV, a hardware-efficient RISC-V accelerator featuring low-latency AES instruction extensions optimized for real-time processing across all AES modes and key sizes. AES-RV integrates three key innovations: high-bandwidth internal buffers for continuous data processing, a specialized AES unit with custom low-latency instructions, and a pipelined system supported by a ping-pong memory transfer mechanism. Implemented on the Xilinx ZCU102 SoC FPGA, AES-RV achieves up to 255.97 times speedup and up to 453.04 times higher energy efficiency compared to baseline and conventional CPU/GPU platforms. It also demonstrates superior throughput and area efficiency against state-of-the-art AES accelerators, making it a strong candidate for secure and high-performance embedded systems.
Related papers
- Hybrid Systolic Array Accelerator with Optimized Dataflow for Edge Large Language Model Inference [8.475319961845903]
Edge accelerator should achieve high area efficiency and minimize external memory access.<n>This paper proposes an edge LLM inference accelerator featuring a hybrid systolic array architecture.<n>Our solution achieves 247/117 (token/s/mm2) while running a 1.3B LLM on long-input/long-output scenarios.
arXiv Detail & Related papers (2025-07-11T20:27:30Z) - Toward a Lightweight, Scalable, and Parallel Secure Encryption Engine [0.0]
SPiME is a lightweight, scalable, and FPGA-compatible Secure Processor-in-Memory Encryption architecture.<n>It integrates the Advanced Encryption Standard (AES-128) directly into a Processing-in-Memory framework.<n>It delivers over 25Gbps in sustained encryption throughput with predictable, low-latency performance.
arXiv Detail & Related papers (2025-06-18T02:25:04Z) - Lightweight and High-Throughput Secure Logging for Internet of Things and Cold Cloud Continuum [2.156208381257605]
We present Parallel Optimal Signatures for Secure Logging (POSLO), a novel digital signature framework.<n>POSLO offers constantsize signatures and public keys, near-optimal signing efficiency, and fine-to-coarse tunable verification for log auditing.<n>For example, POSLO can verify 231 log entries per second on a mid-range consumer GPU while being significantly more compact than state-of-the-art.
arXiv Detail & Related papers (2025-06-10T13:26:36Z) - Joint Resource Management for Energy-efficient UAV-assisted SWIPT-MEC: A Deep Reinforcement Learning Approach [50.52139512096988]
6G Internet of Things (IoT) networks face challenges in remote areas and disaster scenarios where ground infrastructure is unavailable.<n>This paper proposes a novel aerial unmanned vehicle (UAV)-assisted computing (MEC) system enhanced by directional antennas to provide both computational and energy support for ground edge terminals.
arXiv Detail & Related papers (2025-05-06T06:46:19Z) - Quantum circuit for implementing AES S-box with low costs [2.2002244657481826]
Advanced Encryption Standard (AES) is one of the most widely used and extensively studied encryption algorithms globally.<n>In this paper, three quantum circuits are designed to implement the S-box, which is the sole nonlinear component in AES.
arXiv Detail & Related papers (2025-03-08T06:58:44Z) - Secure Resource Allocation via Constrained Deep Reinforcement Learning [49.15061461220109]
We present SARMTO, a framework that balances resource allocation, task offloading, security, and performance.<n>SARMTO consistently outperforms five baseline approaches, achieving up to a 40% reduction in system costs.<n>These enhancements highlight SARMTO's potential to revolutionize resource management in intricate distributed computing environments.
arXiv Detail & Related papers (2025-01-20T15:52:43Z) - RISC-V R-Extension: Advancing Efficiency with Rented-Pipeline for Edge DNN Processing [0.8192907805418583]
This paper introduces the RISC-V R-extension, a novel approach to enhancing deep neural network (DNN) process efficiency on edge devices.
The extension features rented-pipeline stages and architectural pipeline registers (APR), which optimize critical operation execution, thereby reducing latency and memory access frequency.
arXiv Detail & Related papers (2024-07-02T19:25:05Z) - OFDM-Standard Compatible SC-NOFS Waveforms for Low-Latency and Jitter-Tolerance Industrial IoT Communications [53.398544571833135]
This work proposes a spectrally efficient irregular Sinc (irSinc) shaping technique, revisiting the traditional Sinc back to 1924.
irSinc yields a signal with increased spectral efficiency without sacrificing error performance.
Our signal achieves faster data transmission within the same spectral bandwidth through 5G standard signal configuration.
arXiv Detail & Related papers (2024-06-07T09:20:30Z) - Data-Oblivious ML Accelerators using Hardware Security Extensions [9.716425897388875]
Outsourced computation can put client data confidentiality at risk.
We develop Dolma, which applies DIFT to the Gemmini matrix multiplication accelerator.
We show that Dolma incurs low overheads for large configurations.
arXiv Detail & Related papers (2024-01-29T21:34:29Z) - REED: Chiplet-Based Accelerator for Fully Homomorphic Encryption [4.713756093611972]
We present the first-of-its-kind multi-chiplet-based FHE accelerator REED' for overcoming the limitations of prior monolithic designs.<n>Results demonstrate that REED 2.5D microprocessor consumes 96.7 mm$2$ chip area, 49.4 W average power in 7nm technology.
arXiv Detail & Related papers (2023-08-05T14:04:39Z) - Deep Reinforcement Learning Based Multidimensional Resource Management
for Energy Harvesting Cognitive NOMA Communications [64.1076645382049]
Combination of energy harvesting (EH), cognitive radio (CR), and non-orthogonal multiple access (NOMA) is a promising solution to improve energy efficiency.
In this paper, we study the spectrum, energy, and time resource management for deterministic-CR-NOMA IoT systems.
arXiv Detail & Related papers (2021-09-17T08:55:48Z) - RIS-assisted UAV Communications for IoT with Wireless Power Transfer
Using Deep Reinforcement Learning [75.677197535939]
We propose a simultaneous wireless power transfer and information transmission scheme for IoT devices with support from unmanned aerial vehicle (UAV) communications.
In a first phase, IoT devices harvest energy from the UAV through wireless power transfer; and then in a second phase, the UAV collects data from the IoT devices through information transmission.
We formulate a Markov decision process and propose two deep reinforcement learning algorithms to solve the optimization problem of maximizing the total network sum-rate.
arXiv Detail & Related papers (2021-08-05T23:55:44Z) - EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware
Multi-Task NLP Inference [82.1584439276834]
Transformer-based language models such as BERT provide significant accuracy improvement for a multitude of natural language processing (NLP) tasks.
We present EdgeBERT, an in-depth algorithm- hardware co-design for latency-aware energy optimization for multi-task NLP.
arXiv Detail & Related papers (2020-11-28T19:21:47Z) - Optimizing Deep Learning Recommender Systems' Training On CPU Cluster
Architectures [56.69373580921888]
We focus on Recommender Systems which account for most of the AI cycles in cloud computing centers.
By enabling it to run on latest CPU hardware and software tailored for HPC, we are able to achieve more than two-orders of magnitude improvement in performance.
arXiv Detail & Related papers (2020-05-10T14:40:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.