High-Performance ARM-on-ARM Virtualization for Multicore SystemC-TLM-Based Virtual Platforms
- URL: http://arxiv.org/abs/2505.12987v2
- Date: Tue, 24 Jun 2025 08:13:50 GMT
- Title: High-Performance ARM-on-ARM Virtualization for Multicore SystemC-TLM-Based Virtual Platforms
- Authors: Nils Bosbach, Rebecca Pelke, Niko Zurstraßen, Jan Henrik Weinstock, Lukas Jünger, Rainer Leupers,
- Abstract summary: ARM-on-ARM virtual platform achieves up to 10 x speedup over traditional instruction-set-simulator-based models on compute-intensive workloads.<n>We present a multicore SystemC-TLM-based CPU model that can be used as a drop-in replacement for an instruction-set simulator.
- Score: 0.16492989697868893
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The increasing complexity of hardware and software requires advanced development and test methodologies for modern systems on chips. This paper presents a novel approach to ARM-on-ARM virtualization within SystemC-based simulators using Linux's KVM to achieve high-performance simulation. By running target software natively on ARM-based hosts with hardware-based virtualization extensions, our method eliminates the need for instruction-set simulators, which significantly improves performance. We present a multicore SystemC-TLM-based CPU model that can be used as a drop-in replacement for an instruction-set simulator. It places no special requirements on the host system, making it compatible with various environments. Benchmark results show that our ARM-on-ARM-based virtual platform achieves up to 10 x speedup over traditional instruction-set-simulator-based models on compute-intensive workloads. Depending on the benchmark, speedups increase to more than 100 x.
Related papers
- chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations [0.6240840318920522]
We present chemtrain-deploy, a framework that enables model-agnostic deployment of LAMMPS in MD simulations.<n>Chemtrain-deploy supports any JAX-defined semi-local potential, allowing users to exploit the functionality of LAMMPS.<n>It achieves state-of-the-art efficiency and scales to systems containing millions of atoms.
arXiv Detail & Related papers (2025-06-04T15:19:26Z) - Bridging the Gap: Physical PCI Device Integration Into SystemC-TLM Virtual Platforms [0.16492989697868893]
Virtual Platforms (VPs) serve as a platform to execute and debug the unmodified target software at an early design stage.<n>VPs need to provide high simulation speed to ensure the target software executes within a reasonable time.<n>This paper introduces a novel approach for integrating real Peripheral Component Interconnect ( PCI) devices into SystemC-TLM-2.0-based VPs.
arXiv Detail & Related papers (2025-05-21T14:46:41Z) - Tilus: A Virtual Machine for Arbitrary Low-Precision GPGPU Computation in LLM Serving [12.068287973463786]
Serving Large Language Models (LLMs) is critical for AI-powered applications but demands substantial computational resources.<n>Low-precision computation has emerged as a key technique to improve efficiency while reducing resource consumption.<n>Existing approaches for generating low-precision kernels are limited to weight bit widths that are powers of two.
arXiv Detail & Related papers (2025-04-17T14:45:03Z) - oneDAL Optimization for ARM Scalable Vector Extension: Maximizing Efficiency for High-Performance Data Science [1.5672115019395867]
UXL's oneAPI Data Analytics Library (oneDAL) is widely adopted for accelerating ML and data analytics.<n>But its reliance on Intel's Math Kernel Library (MKL) has traditionally limited its compatibility to x86platforms.<n>This paper details the porting of oneDAL to ARM architectures with SVE support, using OpenBLAS as an alternative backend to overcome architectural and performance challenges.
arXiv Detail & Related papers (2025-04-05T17:53:36Z) - KGym: A Platform and Dataset to Benchmark Large Language Models on Linux Kernel Crash Resolution [59.20933707301566]
Large Language Models (LLMs) are consistently improving at increasingly realistic software engineering (SE) tasks.
In real-world software stacks, significant SE effort is spent developing foundational system software like the Linux kernel.
To evaluate if ML models are useful while developing such large-scale systems-level software, we introduce kGym and kBench.
arXiv Detail & Related papers (2024-07-02T21:44:22Z) - In Situ Framework for Coupling Simulation and Machine Learning with
Application to CFD [51.04126395480625]
Recent years have seen many successful applications of machine learning (ML) to facilitate fluid dynamic computations.
As simulations grow, generating new training datasets for traditional offline learning creates I/O and storage bottlenecks.
This work offers a solution by simplifying this coupling and enabling in situ training and inference on heterogeneous clusters.
arXiv Detail & Related papers (2023-06-22T14:07:54Z) - Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures [67.47328776279204]
This work introduces a framework to develop efficient, portable Deep Learning and High Performance Computing kernels.
We decompose the kernel development in two steps: 1) Expressing the computational core using Processing Primitives (TPPs) and 2) Expressing the logical loops around TPPs in a high-level, declarative fashion.
We demonstrate the efficacy of our approach using standalone kernels and end-to-end workloads that outperform state-of-the-art implementations on diverse CPU platforms.
arXiv Detail & Related papers (2023-04-25T05:04:44Z) - Virtualization of Tiny Embedded Systems with a robust real-time capable
and extensible Stack Virtual Machine REXAVM supporting Material-integrated
Intelligent Systems and Tiny Machine Learning [0.0]
This paper shows and evaluates the suitability of the proposed VM architecture for operationally equivalent software and hardware (FPGA) implementations.
In a holistic architecture approach, the VM specifically addresses digital signal processing and tiny machine learning.
arXiv Detail & Related papers (2023-02-17T17:13:35Z) - PLSSVM: A (multi-)GPGPU-accelerated Least Squares Support Vector Machine [68.8204255655161]
Support Vector Machines (SVMs) are widely used in machine learning.
However, even modern and optimized implementations do not scale well for large non-trivial dense data sets on cutting-edge hardware.
PLSSVM can be used as a drop-in replacement for an LVM.
arXiv Detail & Related papers (2022-02-25T13:24:23Z) - Compiler-Driven Simulation of Reconfigurable Hardware Accelerators [0.8807375890824978]
Existing simulators tend to two extremes: low-level and general approaches, such as RTL simulation, that can model any hardware but require substantial effort and long execution times.
This work proposes a compiler-driven simulation workflow that can model hardware accelerator.
arXiv Detail & Related papers (2022-02-01T20:31:04Z) - Using Machine Learning at Scale in HPC Simulations with SmartSim: An
Application to Ocean Climate Modeling [52.77024349608834]
We demonstrate the first climate-scale, numerical ocean simulations improved through distributed, online inference of Deep Neural Networks (DNN) using SmartSim.
SmartSim is a library dedicated to enabling online analysis and Machine Learning (ML) for traditional HPC simulations.
arXiv Detail & Related papers (2021-04-13T19:27:28Z) - Achieving 100X faster simulations of complex biological phenomena by
coupling ML to HPC ensembles [47.44377051031385]
We present DeepDriveMD, a tool for a range of prototypical ML-driven HPC simulation scenarios.
We use it to quantify improvements in the scientific performance of ML-driven ensemble-based applications.
arXiv Detail & Related papers (2021-04-10T15:52:39Z) - Comparing Popular Simulation Environments in the Scope of Robotics and
Reinforcement Learning [0.0]
We show that the chosen simulation environments benefit the most from single core performance.
Using a multi core system, multiple simulations could be run in parallel to increase the performance.
arXiv Detail & Related papers (2021-03-08T09:08:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.