Virtualization of Tiny Embedded Systems with a robust real-time capable
and extensible Stack Virtual Machine REXAVM supporting Material-integrated
Intelligent Systems and Tiny Machine Learning
- URL: http://arxiv.org/abs/2302.09002v1
- Date: Fri, 17 Feb 2023 17:13:35 GMT
- Title: Virtualization of Tiny Embedded Systems with a robust real-time capable
and extensible Stack Virtual Machine REXAVM supporting Material-integrated
Intelligent Systems and Tiny Machine Learning
- Authors: Stefan Bosse, Sarah Bornemann, Bj\"orn L\"ussem
- Abstract summary: This paper shows and evaluates the suitability of the proposed VM architecture for operationally equivalent software and hardware (FPGA) implementations.
In a holistic architecture approach, the VM specifically addresses digital signal processing and tiny machine learning.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the past decades, there has been a significant increase in sensor density
and sensor deployment, driven by a significant miniaturization and decrease in
size down to the chip level, addressing ubiquitous computing, edge computing,
as well as distributed sensor networks. Material-integrated and intelligent
systems (MIIS) provide the next integration and application level, but they
create new challenges and introduce hard constraints (resources, energy supply,
communication, resilience, and security). Commonly, low-resource systems are
statically programmed processors with application-specific software or
application-specific hardware (FPGA). This work demonstrates the need for and
solution to virtualization in such low-resource and constrained systems towards
resilient distributed sensor and cyber-physical networks using a unified
low-resource, customizable, and real-time capable embedded and extensible stack
virtual machine (REXAVM) that can be implemented and cooperate in both software
and hardware. In a holistic architecture approach, the VM specifically
addresses digital signal processing and tiny machine learning. The REXAVM is
highly customizable through the use of VM program code generators at compile
time and incremental code processing at run time. The VM uses an integrated,
highly efficient just-in-time compiler to create Bytecode from text code. This
paper shows and evaluates the suitability of the proposed VM architecture for
operationally equivalent software and hardware (FPGA) implementations. Specific
components supporting tiny ML and DSP using fixed-point arithmetic with respect
to efficiency and accuracy are discussed. An extended use-case section
demonstrates the usability of the introduced VM architecture for a broad range
of applications.
Related papers
- DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution [114.61347672265076]
Development of MLLMs for real-world robots is challenging due to the typically limited computation and memory capacities available on robotic platforms.
We propose a Dynamic Early-Exit Framework for Robotic Vision-Language-Action Model (DeeR) that automatically adjusts the size of the activated MLLM.
DeeR demonstrates significant reductions in computational costs of LLM by 5.2-6.5x and GPU memory of LLM by 2-6x without compromising performance.
arXiv Detail & Related papers (2024-11-04T18:26:08Z) - Co-design of a novel CMOS highly parallel, low-power, multi-chip neural network accelerator [0.0]
We present the NV-1, a new low-power ASIC AI processor that greatly accelerates parallel processing (> 10X) with dramatic reduction in energy consumption.
The resulting device is currently being used in a fielded edge sensor application.
arXiv Detail & Related papers (2024-09-28T15:47:16Z) - Designing and Implementing a Generator Framework for a SIMD Abstraction Library [53.84310825081338]
We present TSLGen, a novel end-to-end framework for generating an SIMD abstraction library.
We show that our framework is comparable to existing libraries, and we achieve the same performance results.
arXiv Detail & Related papers (2024-07-26T13:25:38Z) - Hybrid Oscillator-Qubit Quantum Processors: Instruction Set Architectures, Abstract Machine Models, and Applications [32.40067565226366]
We show that hybrid CV-DV hardware offers a powerful computational paradigm that inherits the strengths of both DV and CV processors.
We present a variety of new hybrid CV-DV compilation techniques, algorithms, and applications.
Hybrid CV-DV quantum computations are beginning to be performed in superconducting, trapped ion, and neutral atom platforms.
arXiv Detail & Related papers (2024-07-15T01:23:47Z) - Efficient and accurate neural field reconstruction using resistive memory [52.68088466453264]
Traditional signal reconstruction methods on digital computers face both software and hardware challenges.
We propose a systematic approach with software-hardware co-optimizations for signal reconstruction from sparse inputs.
This work advances the AI-driven signal restoration technology and paves the way for future efficient and robust medical AI and 3D vision applications.
arXiv Detail & Related papers (2024-04-15T09:33:09Z) - Random resistive memory-based deep extreme point learning machine for
unified visual processing [67.51600474104171]
We propose a novel hardware-software co-design, random resistive memory-based deep extreme point learning machine (DEPLM)
Our co-design system achieves huge energy efficiency improvements and training cost reduction when compared to conventional systems.
arXiv Detail & Related papers (2023-12-14T09:46:16Z) - FusionAI: Decentralized Training and Deploying LLMs with Massive
Consumer-Level GPUs [57.12856172329322]
We envision a decentralized system unlocking the potential vast untapped consumer-level GPU.
This system faces critical challenges, including limited CPU and GPU memory, low network bandwidth, the variability of peer and device heterogeneity.
arXiv Detail & Related papers (2023-09-03T13:27:56Z) - Efficient Machine Learning, Compilers, and Optimizations for Embedded
Systems [21.098443474303462]
Deep Neural Networks (DNNs) have achieved great success in a massive number of artificial intelligence (AI) applications by delivering high-quality computer vision, natural language processing, and virtual reality applications.
These emerging AI applications also come with increasing computation and memory demands, which are challenging to handle especially for the embedded systems where limited/memory resources, tight power budgets, and small form factors are demanded.
This book chapter introduces a series of effective design methods to enable efficient algorithms, compilers, and various optimizations for embedded systems.
arXiv Detail & Related papers (2022-06-06T02:54:05Z) - Distributed On-Sensor Compute System for AR/VR Devices: A
Semi-Analytical Simulation Framework for Power Estimation [2.5696683295721883]
We show that a novel distributed on-sensor compute architecture can reduce the system power consumption compared to a centralized system.
We show that, in the case of the compute-intensive machine learning based Hand Tracking algorithm, the distributed on-sensor compute architecture can reduce the system power consumption.
arXiv Detail & Related papers (2022-03-14T20:18:24Z) - FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task.
The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources.
It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z) - A Heterogeneous In-Memory Computing Cluster For Flexible End-to-End
Inference of Real-World Deep Neural Networks [12.361842554233558]
Deployment of modern TinyML tasks on small battery-constrained IoT devices requires high computational energy efficiency.
Analog In-Memory Computing (IMC) using non-volatile memory (NVM) promises major efficiency improvements in deep neural network (DNN) inference.
We present a heterogeneous tightly-coupled architecture integrating 8 RISC-V cores, an in-memory computing accelerator (IMA), and digital accelerators.
arXiv Detail & Related papers (2022-01-04T11:12:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.