Related papers: Virtualization of Tiny Embedded Systems with a robust real-time capable and extensible Stack Virtual Machine REXAVM supporting Material-integrated Intelligent Systems and Tiny Machine Learning

Virtualization of Tiny Embedded Systems with a robust real-time capable and extensible Stack Virtual Machine REXAVM supporting Material-integrated Intelligent Systems and Tiny Machine Learning

URL: http://arxiv.org/abs/2302.09002v1
Date: Fri, 17 Feb 2023 17:13:35 GMT
Title: Virtualization of Tiny Embedded Systems with a robust real-time capable and extensible Stack Virtual Machine REXAVM supporting Material-integrated Intelligent Systems and Tiny Machine Learning
Authors: Stefan Bosse, Sarah Bornemann, Bj\"orn L\"ussem
Abstract summary: This paper shows and evaluates the suitability of the proposed VM architecture for operationally equivalent software and hardware (FPGA) implementations. In a holistic architecture approach, the VM specifically addresses digital signal processing and tiny machine learning.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the past decades, there has been a significant increase in sensor density and sensor deployment, driven by a significant miniaturization and decrease in size down to the chip level, addressing ubiquitous computing, edge computing, as well as distributed sensor networks. Material-integrated and intelligent systems (MIIS) provide the next integration and application level, but they create new challenges and introduce hard constraints (resources, energy supply, communication, resilience, and security). Commonly, low-resource systems are statically programmed processors with application-specific software or application-specific hardware (FPGA). This work demonstrates the need for and solution to virtualization in such low-resource and constrained systems towards resilient distributed sensor and cyber-physical networks using a unified low-resource, customizable, and real-time capable embedded and extensible stack virtual machine (REXAVM) that can be implemented and cooperate in both software and hardware. In a holistic architecture approach, the VM specifically addresses digital signal processing and tiny machine learning. The REXAVM is highly customizable through the use of VM program code generators at compile time and incremental code processing at run time. The VM uses an integrated, highly efficient just-in-time compiler to create Bytecode from text code. This paper shows and evaluates the suitability of the proposed VM architecture for operationally equivalent software and hardware (FPGA) implementations. Specific components supporting tiny ML and DSP using fixed-point arithmetic with respect to efficiency and accuracy are discussed. An extended use-case section demonstrates the usability of the introduced VM architecture for a broad range of applications.

Related papers

Toward a Lightweight, Scalable, and Parallel Secure Encryption Engine [0.0]
SPiME is a lightweight, scalable, and FPGA-compatible Secure Processor-in-Memory Encryption architecture.<n>It integrates the Advanced Encryption Standard (AES-128) directly into a Processing-in-Memory framework.<n>It delivers over 25Gbps in sustained encryption throughput with predictable, low-latency performance.
arXiv Detail & Related papers (2025-06-18T02:25:04Z)
chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations [0.6240840318920522]
We present chemtrain-deploy, a framework that enables model-agnostic deployment of LAMMPS in MD simulations.<n>Chemtrain-deploy supports any JAX-defined semi-local potential, allowing users to exploit the functionality of LAMMPS.<n>It achieves state-of-the-art efficiency and scales to systems containing millions of atoms.
arXiv Detail & Related papers (2025-06-04T15:19:26Z)
Bridging the Gap: Physical PCI Device Integration Into SystemC-TLM Virtual Platforms [0.16492989697868893]
Virtual Platforms (VPs) serve as a platform to execute and debug the unmodified target software at an early design stage.<n>VPs need to provide high simulation speed to ensure the target software executes within a reasonable time.<n>This paper introduces a novel approach for integrating real Peripheral Component Interconnect ( PCI) devices into SystemC-TLM-2.0-based VPs.
arXiv Detail & Related papers (2025-05-21T14:46:41Z)
Tilus: A Virtual Machine for Arbitrary Low-Precision GPGPU Computation in LLM Serving [12.068287973463786]
Serving Large Language Models (LLMs) is critical for AI-powered applications but demands substantial computational resources. Low-precision computation has emerged as a key technique to improve efficiency while reducing resource consumption. Existing approaches for generating low-precision kernels are limited to weight bit widths that are powers of two.
arXiv Detail & Related papers (2025-04-17T14:45:03Z)
Pilot-Quantum: A Quantum-HPC Middleware for Resource, Workload and Task Management [1.381966718755792]
Pilot-Quantum is designed to provide unified application-level management of resources and workloads across hybrid quantum-classical environments. It implements the Pilot Abstraction conceptual model, originally developed for HPC, to manage resources, workloads, and tasks.
arXiv Detail & Related papers (2024-12-24T15:55:46Z)
DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution [114.61347672265076]
Development of MLLMs for real-world robots is challenging due to the typically limited computation and memory capacities available on robotic platforms. We propose a Dynamic Early-Exit Framework for Robotic Vision-Language-Action Model (DeeR) that automatically adjusts the size of the activated MLLM. DeeR demonstrates significant reductions in computational costs of LLM by 5.2-6.5x and GPU memory of LLM by 2-6x without compromising performance.
arXiv Detail & Related papers (2024-11-04T18:26:08Z)
Co-design of a novel CMOS highly parallel, low-power, multi-chip neural network accelerator [0.0]
We present the NV-1, a new low-power ASIC AI processor that greatly accelerates parallel processing (> 10X) with dramatic reduction in energy consumption. The resulting device is currently being used in a fielded edge sensor application.
arXiv Detail & Related papers (2024-09-28T15:47:16Z)
Designing and Implementing a Generator Framework for a SIMD Abstraction Library [53.84310825081338]
We present TSLGen, a novel end-to-end framework for generating an SIMD abstraction library. We show that our framework is comparable to existing libraries, and we achieve the same performance results.
arXiv Detail & Related papers (2024-07-26T13:25:38Z)
Hybrid Oscillator-Qubit Quantum Processors: Instruction Set Architectures, Abstract Machine Models, and Applications [32.40067565226366]
We show that hybrid CV-DV hardware offers a powerful computational paradigm that inherits the strengths of both DV and CV processors. We present a variety of new hybrid CV-DV compilation techniques, algorithms, and applications. Hybrid CV-DV quantum computations are beginning to be performed in superconducting, trapped ion, and neutral atom platforms.
arXiv Detail & Related papers (2024-07-15T01:23:47Z)
Efficient and accurate neural field reconstruction using resistive memory [52.68088466453264]
Traditional signal reconstruction methods on digital computers face both software and hardware challenges. We propose a systematic approach with software-hardware co-optimizations for signal reconstruction from sparse inputs. This work advances the AI-driven signal restoration technology and paves the way for future efficient and robust medical AI and 3D vision applications.
arXiv Detail & Related papers (2024-04-15T09:33:09Z)
Random resistive memory-based deep extreme point learning machine for unified visual processing [67.51600474104171]
We propose a novel hardware-software co-design, random resistive memory-based deep extreme point learning machine (DEPLM) Our co-design system achieves huge energy efficiency improvements and training cost reduction when compared to conventional systems.
arXiv Detail & Related papers (2023-12-14T09:46:16Z)
FusionAI: Decentralized Training and Deploying LLMs with Massive Consumer-Level GPUs [57.12856172329322]
We envision a decentralized system unlocking the potential vast untapped consumer-level GPU. This system faces critical challenges, including limited CPU and GPU memory, low network bandwidth, the variability of peer and device heterogeneity.
arXiv Detail & Related papers (2023-09-03T13:27:56Z)
Efficient Machine Learning, Compilers, and Optimizations for Embedded Systems [21.098443474303462]
Deep Neural Networks (DNNs) have achieved great success in a massive number of artificial intelligence (AI) applications by delivering high-quality computer vision, natural language processing, and virtual reality applications. These emerging AI applications also come with increasing computation and memory demands, which are challenging to handle especially for the embedded systems where limited/memory resources, tight power budgets, and small form factors are demanded. This book chapter introduces a series of effective design methods to enable efficient algorithms, compilers, and various optimizations for embedded systems.
arXiv Detail & Related papers (2022-06-06T02:54:05Z)
Distributed On-Sensor Compute System for AR/VR Devices: A Semi-Analytical Simulation Framework for Power Estimation [2.5696683295721883]
We show that a novel distributed on-sensor compute architecture can reduce the system power consumption compared to a centralized system. We show that, in the case of the compute-intensive machine learning based Hand Tracking algorithm, the distributed on-sensor compute architecture can reduce the system power consumption.
arXiv Detail & Related papers (2022-03-14T20:18:24Z)
FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task. The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources. It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z)
A Heterogeneous In-Memory Computing Cluster For Flexible End-to-End Inference of Real-World Deep Neural Networks [12.361842554233558]
Deployment of modern TinyML tasks on small battery-constrained IoT devices requires high computational energy efficiency. Analog In-Memory Computing (IMC) using non-volatile memory (NVM) promises major efficiency improvements in deep neural network (DNN) inference. We present a heterogeneous tightly-coupled architecture integrating 8 RISC-V cores, an in-memory computing accelerator (IMA), and digital accelerators.
arXiv Detail & Related papers (2022-01-04T11:12:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.